On Thu, 25 Feb 2010 13:06:03 -0500 Robert Muir <rcm...@gmail.com> wrote:
> Yeah, Thai and Arabic have the stuff in Solr 1.4 > For Chinese, if you want to do CJK bigram indexing, this is there > too. If you want to do word-based "smart" indexing, you need to > add an additional jar file to your classpath. OK, but unfortunately, I have little knowledge of these languages, so that I would not be able to evaluate to what extent that they are working. > we can add a wiki page with examples of how to use these maybe to > make it easier? > > we could also add notes to new ones in lucene (hindi, czech, > bulgarian, etc), as it might be easier to copy some code around > and get them working with solr 1.4 than to write your own! That sounds great. I am all in favour of not trying to reinvent the wheel, and probably badly at that. > separately, would you be interesting in helping with Bengali and > Marathi? The Indian languages that I am personally conversant with are Hindi, and my native tongue, Oriya. Bengali is quite close to Oriya linguistically, though with a different script. Marathi shares a script with Hindi, but words in the language are quite different. I can try to enlist other open-source folk in India: We have been part of a moderately successful localisation effort in India (http://indlinux.org). So, yes, I would be interested, but probably I have a fair amount of learning to do about what is needed in the context of a search engine. Regards, Gora