On Wed, 24 Feb 2010 15:49:15 +0100
Markus Jelsma <mar...@buyways.nl> wrote:

> Well, i don't have a specific request in mind. However, i can
> image a growing internet market for thai, chinese and arabic
> speaking people and the native languages on the african
> continent. Providing them with stemmers to handle plurals etc.
> will allow for a better search experience.

The same is true for Indian languages.

Actually, the state of the art for NLP in Indian languages is
quite poor, at least in the open-source world. So, even before 
stemmers, one should address things like phonetic analysers,
spellcheckers, stop words, etc. This latter part is possible now,
and we have an alpha-level implementation for Hindi that we would
be glad to contribute once it is stable.

Which reminds me: One thing that I would like to see immediately
in Solr is to have the Metaphone/DoubleMetaphone phonetic analysers
use a configuration file for phonetic rules, rather than having
these hard coded. The aspell library does this, for example. Please
see http://aspell.net/man-html/Phonetic-Code.html#Phonetic-Code
for an explanation of its rules.


Reply via email to