I have studied some Russian. I kind of got the picture from the texts that all 
the exceptions had already been 'found', and were listed in the book. 

I do know that languages are living, changing organisms, but Russian has got to 
be more regular than English I would think, even WITH all six cases and 3 
genders.

Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Tue, 7/27/10, Robert Muir <rcm...@gmail.com> wrote:

> From: Robert Muir <rcm...@gmail.com>
> Subject: Re: Russian stemmer
> To: solr-user@lucene.apache.org
> Date: Tuesday, July 27, 2010, 7:12 AM
> right, but your problem is this is
> the current output:
> 
> Ковров -> Ковр
> Коврову -> Ковров
> Ковровом -> Ковров
> Коврове -> Ковров
> 
> so, if Ковров was simply left alone, all your forms
> would match...
> 
> 2010/7/27 Oleg Burlaca <o...@burlaca.com>
> 
> > Thanks Robert for all your help,
> >
> > The idea of ы[A-Z].* stopwords is ideal for the
> english language,
> > although in russian nouns are inflected: Борис,
> Борису, Бориса, Борисом
> >
> > I'll try the RussianLightStemFilterFactory (the
> article in the PDF
> > mentioned
> > it's more accurate).
> >
> > Once again thanks,
> > Oleg Burlaca
> >
> > On Tue, Jul 27, 2010 at 12:07 PM, Robert Muir <rcm...@gmail.com>
> wrote:
> >
> > > 2010/7/27 Oleg Burlaca <o...@burlaca.com>
> > >
> > > > Actually the situation with Немцов
> из ок,
> > > > I've just checked how Yandex works with
> Немцов and Немцова:
> > > > http://nano.yandex.ru/project/inflect/
> > > >
> > > > I think there are two solutions:
> > > > a) manually search for both Немцов and
> then Немцова
> > > > b) use wildcard query: Немцов*
> > > >
> > >
> > > Well, here is one idea of a more general
> solution.
> > > The problem with "protected words" is you must
> have a complete list.
> > >
> > > One idea would be to add a filter that protects
> any words from stemming
> > > that
> > > match a regular expression:
> > > In english maybe someone wants to avoid any
> capitalized words to reduce
> > > trouble: [A-Z].*
> > > in your case then some pattern like [A-Я].*ов
> might prevent problems.
> > >
> > >
> > > > Robert, thanks for the
> RussianLightStemFilterFactory info,
> > > > I've found this page
> > > >
> > http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
> > > > that somehow describes it. Where can I read
> more about
> > > > RussianLightStemFilterFactory ?
> > > >
> > > >
> > > Here is the link:
> > >
> > >
> > http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf
> > >
> > >
> > > > Regards,
> > > > Oleg
> > > >
> > > > 2010/7/27 Oleg Burlaca <o...@burlaca.com>
> > > >
> > > > > A similar word is Немцов.
> > > > > The strange thing is that searching for
> "Немцова" will not find
> > > documents
> > > > > containing "Немцов"
> > > > >
> > > > > Немцова: 14 articles
> > > > >
> > > > >
> > > >
> > >
> > http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
> > > > >
> > > > > Немцов: 74 articles
> > > > >
> > > > >
> > > >
> > >
> > http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Robert Muir
> > > rcm...@gmail.com
> > >
> >
> 
> 
> 
> -- 
> Robert Muir
> rcm...@gmail.com
>

Reply via email to