2006/11/23, Jean-Francois Dockes <[EMAIL PROTECTED]>:
mikkel.kamstrup at gmail.com (Mikkel Kamstrup Erlandsen) writes: > magnus.bergman at observer.net (Magnus Bergman) writes: > > One thing that English users seldom consider is the usages of several > > languages. Which language is being used is important to know in order > > to decide what stemming rules to use, and which stop-words use (in > > English "the" is a stop-word while it in Swedish means tea and is > > something that is adequate to search for). People using other languages > > are very often multi lingual (using English as well). Therefore it is > > interesting to know which language the query is in (search engines > > might also be able to translate queries to search in document written > > in different languages). > > This is a good point. However I suggest leaving this up to the actual > implementations. After all it is an indexing time question what stemmer to > use when indexing a document... This is not true. An indexer can chose to perform stem processing at query time. Recoll is one, but I don't think it's the only one. There are quite good reasons to do so.
Right. In my sleepy haze last night I was not thinking straight :-) I've put some more detail in my answer to Fabrice's post. Cheers, Mikkel
_______________________________________________ xdg mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/xdg
