Re: which German stemmer to use?

Paul Libbrecht Thu, 24 Mar 2011 00:39:17 -0700

In our ActiveMath project, we have had positive feedback in Lucene with the 
 SnowBallAnalyzer(Version.LUCENE_29,"German") 
which is probably one of the two below.


I note that you may want to be careful to use one field with exact matching 
(e.g. whitespace analyzer and lowercase filter) an done field with stemmed 
matches. That's two fields in the index and a query-expansion mechanism such as 
dismax to

  text-de^2.0 text-de.stemmed^1.2
(add the phonetic...)

One of the biggest issues that our testers formulated is that compound words 
should be split. I believe this issue is also very present in technology texts. 
Thus far only the compound-words analyzer can do such a split and you need the 
compounds to be manually input. Maybe that's doable?

paul


Le 24 mars 2011 à 00:14, Christopher Bottaro a écrit :

> The wiki lists 5 available, but doesn't do a good job at explaining or
> recommending one:
> 
> GermanStemFilterFactory
> SnowballPorterFilterFactory (German)
> SnowballPorterFilterFactory (German2)
> GermanLightStemFilterFactory
> GermanMinimalStemFilterFactory
> 
> Which is the best one to use in general?  Which is the best to use when the
> content being indexed is German technology articles?
> 
> Thanks for the help.

Re: which German stemmer to use?

Reply via email to