On Wed, 24 Feb 2010 15:49:15 +0100 Markus Jelsma <mar...@buyways.nl> wrote:
> Well, i don't have a specific request in mind. However, i can > image a growing internet market for thai, chinese and arabic > speaking people and the native languages on the african > continent. Providing them with stemmers to handle plurals etc. > will allow for a better search experience. The same is true for Indian languages. Actually, the state of the art for NLP in Indian languages is quite poor, at least in the open-source world. So, even before stemmers, one should address things like phonetic analysers, spellcheckers, stop words, etc. This latter part is possible now, and we have an alpha-level implementation for Hindi that we would be glad to contribute once it is stable. Which reminds me: One thing that I would like to see immediately in Solr is to have the Metaphone/DoubleMetaphone phonetic analysers use a configuration file for phonetic rules, rather than having these hard coded. The aspell library does this, for example. Please see http://aspell.net/man-html/Phonetic-Code.html#Phonetic-Code for an explanation of its rules. Regards, Gora