Cliquez ici.
Cliquez ici.
Accueil
 chercher             Plan du site             Info (English version) 
L'histoire de XML s'écrit en ce moment même. XMLfr vous aide à la suivre et à en dégager les tendances.Les listes de discussions XMLfr sont à votre disposition pour réagir sur nos articles ou simplement poser une question.Si vous ètes passionnée(e) par XML, pourquoi ne pas en faire votre métier ?XMLfr n'est heureusement pas le seul site où l'on parle de XML. Découvrez les autres grâce à XMLfr et à l'ODP.Les partenaires grâce auxquels XMLfr peut se développer.Pour tout savoir sur XMLfr.XMLfr sans fil, c'est possible !Pour ceux qui veulent vraiment en savoir plus sur XML.L'index du site.

 
Cliquez ici.

websemantique@xmlfr.org : liste de discussion de la communauté francophone du Web Sémantique

[websemantique] [Fwd: Re: http://planete.websemantique.org/ and user defined content filtering]

[websemantique] [Fwd: Re: http://planete.websemantique.org/ and user defined content filtering]

Auteur: Eric van der Vlist <vdv@dyomedea.com>
Date: 12/10/2006 - 11:41
X-Mailer: Evolution 2.6.1

Bonjour,
Ci-joint un message de Sam Ruby confirmant que le filtrage de
http://planete.websemantique.org/ par sujet et/ou par langage est
relativement simple à mettre en oeuvre...

Eric

-------- Message transféré --------
De: Sam Ruby <rubys@intertwingly.net>
À: Eric van der Vlist <vdv@dyomedea.com>
Cc: devel@lists.planetplanet.org
Sujet: Re: http://planete.websemantique.org/ and user defined content
filtering
Date: Thu, 12 Oct 2006 07:27:29 -0400

Eric van der Vlist wrote:
> Hi,
>
> We have created a new planet for semantic web oriented blogs in French:
> http://planete.websemantique.org/
>
> Right now, this is just a plain vanilla installation of the venus
> flavor. The install has been really trouble free and I'd like to thank
> you for the quality of this software.
>
> Our users have two requests that involve filtering the blog entries that
> appear on the planet.
>
> The first one is generic to most of the planet sites: most of the blogs
> are not focused on a single topic and planets have a lot of entries that
> are irrelevant to the planet main topic.
>
> I personally find that this is a feature more than a bug since it's nice
> to have a broad vision of what blogers write outside the scope of the
> planet topic but I can also understand that it would be useful to give
> users the ability to get a filtered view of the planet (I am not so
> keen ).
>
> The other one is more specific to multilingual planets. Although the
> planet is primarily targeted to French speaking visitors, most of the
> blogs that we federate contain both French and English posts. Again, I
> personally prefer to see entries in both languages but can also
> understand that some users might prefer to see only posts in a single
> language.
>
> The main issue is that we'd like to add these features on existing
> feeds. These feeds do not always include subject or categories that can
> be used for topic filtering and none of them include any specific
> information about the language.
>
> My idea would be to detect that an item is relevant to the planet main
> topic by checking a number of keywords (for the semantic web, this seems
> quite doable) but other algorithm could be used including Bayesian
> filters like those used by anti spam systems but this would require a
> phase of training.
>
> For the language detection, I would try to find an open source system to
> do that. Another option would be to check spell against different
> languages (in our case there are only two) and take the language for
> which the fewer errors have been detected.
>
> I have taken a look at the filter mechanism and all that seems to be
> pretty easy to implement (BTW, I am wondering if there is already a
> generic XSLT filter). The only things which is still somewhat mysterious
> to me is the options mechanism but I can probably find out by myself how
> it works...
>
> The difference between these filters and the other similar content
> filters that I have found in the list archives would be that these one
> would not remove entries but add new metadata based on their findings.
> These metadata could then be copied into the XHTML pages and a piece of
> JavaScript would hide entries based on user preferences.
>
> This seems quite simple and obvious to implement but I am wondering if
> this has already been done and, if not, if you have any advise for me.

My guess is that you have figured out pretty much everything you need to
know already, but here's an overview:

Filters are simple Unix pipes. Input comes in stdin, parameters come
from the config file, and output goes to stdout. Anything written to
stderr is logged as an ERROR message. If no stdout is produced, the
entry is not written to the cache or processed further.

Input to filters is a aggressively normalized entry. Everything is
converted to Atom 1.0, XHTML, and utf-8, meaning that you don't have to
worry about funky feeds, tag soup, or encoding. If a feed is RSS 1.0
with 10 items, your filter will be called ten times, each with a single
Atom 1.0 entry.

There is a small set of example filters in the 'filters' directory.
coral_cdn_filter will change links to images in the entry itself. The
filters in the stripAd directory are conceptually similar.

excerpt is closest to what you describe. It adds metadata (in the form
of a planet:excerpt element) to the feed itself. You can see examples
of how parameters are passed to this program in either
tests/data/filter/excerpt-images.ini or examples/opml-top100.ini. Note:
templates written using htmltmpl currently only have access to a fixed
set of fields, whereas xslt templates have access to the everything.

xpath_sifter is a variation of the above, including or excluding feeds
based on the presence (or absence) of data specified by xpath expressions.

Final notes:

  * the file extension of the filter is significant. .py invokes python.
    .xslt involkes xslt. sed and tmpl (a.k.a. htmltmp) are also options.
    If you wanted, say perl or ruby or class/jar (java), these would be
    easy to add.

  * at the moment, xslt based filters don't have access to parameters.
    This is definitely doable, just not yet implemented. The change
    would be to planet/shells/xslt.py.

  * Any filters listed in the [planet] section of your config.ini will
    have be invoked on all feeds. Filters listed in individual [feed]
    sections will only be invoked on those feeds.

  * If you list multiple filters, they are simply invoked in the order
    you list them (think unix pipes).

  * You mention javascript, and that's definitely doable. Another
    more low tech but effective approach is to use multiple output
    templates to produce varying results. If you look at
    themes/mobile/config.ini, you will see that both
    index.html.xslt and mobile.html.xslt listed.

Hopefully, this will be enough to get you started. But mostly, have
fun!, and if you have any questions, ask them here.

- Sam Ruby

-- 
GPG-PGP: 2A528005
Carnet web :
           http://eric.van-der-vlist.com/blog?t=category&a=Fran%C3%A7ais
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------
-- Attached file included as plaintext by Ecartis --
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
iD8DBQBFLin2Dvn+ZCpSgAURAq4DAJ9rRGVbCKtvSofrHRZ9PsblANivRwCfX6SI
GRPQ5DV4Pgk7LsmTfHRkKDs=
=4tsO
-----END PGP SIGNATURE-----
--
Liste de diffusion "websemantique@xmlfr.org" 
(http://xmlfr.org/communautes/websemantique/listes/websemantique).

Contribuez au developpement du Web Semantique francophone
(http://websemantique.org) !

Pour resilier votre abonnement, envoyez un message contenant 
la commande "unsubscribe" a websemantique-request@xmlfr.org
(mailto:websemantique-request@xmlfr.org?Subject=unsubscribe)
Received on Thu Oct 12 13:41:44 2006

Archive générée par hypermail 2.1.8 le 31/10/2006 - 03:12 UTC

webmaster@xmlfr.org

 

dev@xmlfr.org

Liste de discussion de la communauté du Web Sémantique francophone.

Cette liste publique est dédiée aux discussions en français concernant le Web Sémantique.



Cliquez ici.
Cliquez ici.

Devenez rédacteur <XML>fr et contribuez au développement du xml francophone !
Les documents publiés sur ce site le sont sous licence "Open Content"
Conception graphique
  l.henriot  

Conception, réalisation et hébergement
Questions ou commentaires
  redacteurs@xmlfr.org