On Apr 21, 2010, at 10:30 AM, Mark Miller wrote: > But they don't usually call 'non algorithmic' stemming 'stemming'. Stemming > usually means using a simple heuristic process. When you use vocabulary and > morphology, its usually called lemmatization rather than stemming. >
"stemmer" is jargon that does not have a precise definition. For example, the LinguistX morphological analyzers are called "stemmers" and they provide options that are dictionary-based inflectional, dictionary-based derivational, and algorithmic. You can also combine those, so you can get accurate dictionary-based stems, then use an algorithmic stemmer on words not in the dictionary. Stemmers may convert the surface word to a dictionary form (inflectional), to a root dictionary form (derivational), or to a non-word key (the Porter algorithm). Arabic and Hebrew stemmers often choose an intermediate form with some vowel marks rather than the all-consonant "semetic root". Language is complicated. Maintaining a high-quality dictionary is expensive, so you probably won't find many free ones. wunder -- Walter Underwood Lead Engineer, Mark Logic