On Apr 21, 2010, at 10:30 AM, Mark Miller wrote:

> But they don't usually call 'non algorithmic' stemming 'stemming'. Stemming 
> usually means using a simple heuristic process. When you use vocabulary and 
> morphology, its usually called lemmatization rather than stemming.
> 

"stemmer" is jargon that does not have a precise definition.

For example, the LinguistX morphological analyzers are called "stemmers" and 
they provide options that are dictionary-based inflectional, dictionary-based 
derivational, and algorithmic. You can also combine those, so you can get 
accurate dictionary-based stems, then use an algorithmic stemmer on words not 
in the dictionary.

Stemmers may convert the surface word to a dictionary form (inflectional), to a 
root dictionary form (derivational), or to a non-word key (the Porter 
algorithm). Arabic and Hebrew stemmers often choose an intermediate form with 
some vowel marks rather than the all-consonant "semetic root".

Language is complicated.

Maintaining a high-quality dictionary is expensive, so you probably won't find 
many free ones.

wunder
--
Walter Underwood
Lead Engineer, Mark Logic







Reply via email to