Re: Cyrillic problem

Robert Muir Mon, 01 Mar 2010 15:16:33 -0800

as far as cyrillic goes, any of the analyzers will handle cyrillic
characters. so you can just use the "textgen" or whatever in the example
schema and everything is ok, standardanalyzer will work too.


you don't need to use the RussianAnalyzer, the only special thing it has is
awareness of russian stopwords and a russian stemming algorithm.

On Mon, Mar 1, 2010 at 6:11 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> Hmmm, I'm nowhere near an expert on how the analyzers actually work, so I
> have to
> punt a bit here. And certainly take any of "the regulars" advice if they
> give it <G>...
>
> But outside of stemming, Lucene/SOLR really doesn't understand the concept
> of
> "language". And that's not even Lucene, it's the stemmer code. The
> Analyzers
> are just concerned with producing tokens.
>
> There are some special cases where, say, accents are folded. Various
> European
> languages have accent, grave and unaccented characters
> for instance, which should all be treated as one character for a good
> search
> experience. See IsoLatin1AccentFilter.
>
> But as I remember (OK, it's 35 years ago that I had 2 years of Russian in
> college, OK?)
> the cyrillic alphabet doesn't suffer from that kind of problem, so it's
> probably worth
> giving it a try. At very worst, you could pre-process your indexed text and
> query text
> to smooth out any anomalies. If you want to dig farther, you could make
> your
> own
> analyzer.....
>
> HTH
> Erick
>
> On Mon, Mar 1, 2010 at 4:31 PM, michaelnazaruk <michaelnaza...@gmail.com
> >wrote:
>
> >
> > Thank you! And one little question:
> > Can I use RussianAnalyzer  for ukrainian characters?
> > --
> > View this message in context:
> > http://old.nabble.com/Cyrillic-problem-tp27744106p27749323.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>



-- 
Robert Muir
rcm...@gmail.com

Re: Cyrillic problem

Reply via email to