Hello,
I'm using SnowballPorterFilterFactory with language=Russian.
The stemming works ok except people names, geographical places.
Here are some examples:
searching for Ковров should also find Коврова, Коврову, Ковровом, Коврове.
Are there other stemming plugins for the russian language that
All of your examples stem to ковров:
assertAnalyzesTo(a, Коврова Коврову Ковровом Коврове,
new String[] { ковров, ковров, ковров, ковров });
}
Are you sure you enabled this at *both* index and query time?
2010/7/27 Oleg Burlaca o...@burlaca.com
Hello,
I'm using
another look, your problem is ковров itself... its mapped to ковр
a workaround might be to use the protected words functionality to
keep ковров and any other problematic people/geo names as-is.
separately, in trunk there is an alternative russian stemmer
(RussianLightStemFilterFactory), which
there is an alternative russian stemmer
(RussianLightStemFilterFactory), which might give you less problems on
average, but I noticed it has this same problem with the example you gave.
On Tue, Jul 27, 2010 at 4:25 AM, Robert Muir rcm...@gmail.com wrote:
All of your examples stem to ковров
A similar word is Немцов.
The strange thing is that searching for Немцова will not find documents
containing Немцов
Немцова: 14 articles
http://www.sova-center.ru/search/?lg=1q=%D0%BD%D0%B5%D0%BC%D1%86%D0%BE%D0%B2%D0%B0
Немцов: 74 articles
Actually the situation with Немцов из ок,
I've just checked how Yandex works with Немцов and Немцова:
http://nano.yandex.ru/project/inflect/
I think there are two solutions:
a) manually search for both Немцов and then Немцова
b) use wildcard query: Немцов*
Robert, thanks for the
2010/7/27 Oleg Burlaca o...@burlaca.com
Actually the situation with Немцов из ок,
I've just checked how Yandex works with Немцов and Немцова:
http://nano.yandex.ru/project/inflect/
I think there are two solutions:
a) manually search for both Немцов and then Немцова
b) use wildcard query:
Thanks Robert for all your help,
The idea of ы[A-Z].* stopwords is ideal for the english language,
although in russian nouns are inflected: Борис, Борису, Бориса, Борисом
I'll try the RussianLightStemFilterFactory (the article in the PDF mentioned
it's more accurate).
Once again thanks,
Oleg
right, but your problem is this is the current output:
Ковров - Ковр
Коврову - Ковров
Ковровом - Ковров
Коврове - Ковров
so, if Ковров was simply left alone, all your forms would match...
2010/7/27 Oleg Burlaca o...@burlaca.com
Thanks Robert for all your help,
The idea of ы[A-Z].* stopwords
: Russian stemmer
To: solr-user@lucene.apache.org
Date: Tuesday, July 27, 2010, 7:12 AM
right, but your problem is this is
the current output:
Ковров - Ковр
Коврову - Ковров
Ковровом - Ковров
Коврове - Ковров
so, if Ковров was simply left alone, all your forms
would match...
2010/7/27
removing HTML code... So I created my
factories.
Regards,
Daniel
--
View this message in context:
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11646823
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Andrew.
This is an example for one FilterFactory:
public class RussianStemFilterFactory extends BaseTokenFilterFactory {
private String charset;/** * @see
org.apache.solr.analysis.BaseTokenFilterFactory#init(java.util.Map) */
@Overridepublic void init(MapString, String
Hi Andrew
Yes, I saw that. As I'm not knowledgeable in Russian I had to infer it was
adequate. But as you have much more to add to it, it could be interesting if
you could contribute that.
The problem is Russian analyzer and it's filters are all final class, don't
allowing an elegant extension.
to this.
--
View this message in context:
http://www.nabble.com/Problem-with-Russian-stemmer-in-Solr-1.2-tf4049948.html#a11505646
Sent from the Solr - User mailing list archive at Nabble.com.
Hi Andrew
In fact I did it creating all the Factories for Solr, but I think you can
use it directly, changing your index like this:
fieldtype name=cpstext_russian class=solr.TextField
positionIncrementGap=100
analyzer type=index
class=”org.apache.lucene.analysis.ru.RussianAnalyzer”
15 matches
Mail list logo