On Thu, 25 Feb 2010 13:06:03 -0500
Robert Muir <rcm...@gmail.com> wrote:

> Yeah, Thai and Arabic have the stuff in Solr 1.4
> For Chinese, if you want to do CJK bigram indexing, this is there
> too. If you want to do word-based "smart" indexing, you need to
> add an additional jar file to your classpath.

OK, but unfortunately, I have little knowledge of these languages,
so that I would not be able to evaluate to what extent that they
are working.

> we can add a wiki page with examples of how to use these maybe to
> make it easier?
> 
> we could also add notes to new ones in lucene (hindi, czech,
> bulgarian, etc), as it might be easier to copy some code around
> and get them working with solr 1.4 than to write your own!

That sounds great. I am all in favour of not trying to reinvent
the wheel, and probably badly at that.

> separately, would you be interesting in helping with Bengali and
> Marathi?

The Indian languages that I am personally conversant with are
Hindi, and my native tongue, Oriya. Bengali is quite close to
Oriya linguistically, though with a different script. Marathi
shares a script with Hindi, but words in the language are
quite different. I can try to enlist other open-source folk
in India: We have been part of a moderately successful localisation
effort in India (http://indlinux.org).

So, yes, I would be interested, but probably I have a fair amount
of learning to do about what is needed in the context of a search
engine.

Regards,
Gora

Reply via email to