Hi,

I have the requirement to index and stem Croatian, Macedonian, Serbian and Slovenian content. I started by creating a collection _hr_ for the Croatian content and configured the HunSpellStemFilterFactory using the .dic and .aff files provided by OpenOffice. While testing my configuration I noticed that only very simple forms such as

hrvatski -> hrvatska,
algoritamskom -> algoritamska

get "stemmed". I was wondering whether there are better approaches for Croatian content. I haven't tested the dict and aff files for the other languages yet but I would expect similar results.

I am using Solr 4.1.

Any pointers to better stemmers, open source or commercial, are much appreciated.

Many thanks,
Alex

Reply via email to