Thanks Robert for all your help,

The idea of ы[A-Z].* stopwords is ideal for the english language,
although in russian nouns are inflected: Борис, Борису, Бориса, Борисом

I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned
it's more accurate).

Once again thanks,
Oleg Burlaca

On Tue, Jul 27, 2010 at 12:07 PM, Robert Muir <rcm...@gmail.com> wrote:

> 2010/7/27 Oleg Burlaca <o...@burlaca.com>
>
> > Actually the situation with Немцов из ок,
> > I've just checked how Yandex works with Немцов and Немцова:
> > http://nano.yandex.ru/project/inflect/
> >
> > I think there are two solutions:
> > a) manually search for both Немцов and then Немцова
> > b) use wildcard query: Немцов*
> >
>
> Well, here is one idea of a more general solution.
> The problem with "protected words" is you must have a complete list.
>
> One idea would be to add a filter that protects any words from stemming
> that
> match a regular expression:
> In english maybe someone wants to avoid any capitalized words to reduce
> trouble: [A-Z].*
> in your case then some pattern like [A-Я].*ов might prevent problems.
>
>
> > Robert, thanks for the RussianLightStemFilterFactory info,
> > I've found this page
> > http://www.mail-archive.com/solr-comm...@lucene.apache.org/msg06857.html
> > that somehow describes it. Where can I read more about
> > RussianLightStemFilterFactory ?
> >
> >
> Here is the link:
>
> http://doc.rero.ch/lm.php?url=1000,43,4,20091209094227-CA/Dolamic_Ljiljana_-_Indexing_and_Searching_Strategies_for_the_Russian_20091209.pdf
>
>
> > Regards,
> > Oleg
> >
> > 2010/7/27 Oleg Burlaca <o...@burlaca.com>
> >
> > > A similar word is Немцов.
> > > The strange thing is that searching for "Немцова" will not find
> documents
> > > containing "Немцов"
> > >
> > > Немцова: 14 articles
> > >
> > >
> >
> http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
> > >
> > > Немцов: 74 articles
> > >
> > >
> >
> http://www.sova-center.ru/search/?lg=1&q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>

Reply via email to