Stemming Croatian, Macedonian, Serbian and Slovenian content

Alexander Rosemann Tue, 04 Mar 2014 12:49:31 -0800

Hi,

I have the requirement to index and stem Croatian, Macedonian, Serbianand Slovenian content. I started by creating a collection _hr_ for theCroatian content and configured the HunSpellStemFilterFactory using the.dic and .aff files provided by OpenOffice. While testing myconfiguration I noticed that only very simple forms such as


hrvatski -> hrvatska,
algoritamskom -> algoritamska

get "stemmed". I was wondering whether there are better approaches forCroatian content. I haven't tested the dict and aff files for the otherlanguages yet but I would expect similar results.


I am using Solr 4.1.

Any pointers to better stemmers, open source or commercial, are muchappreciated.


Many thanks,
Alex

Stemming Croatian, Macedonian, Serbian and Slovenian content

Reply via email to