On Wed, Apr 21, 2010 at 1:38 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:
> Why do these approaches have to be mutually exclusive? > Do a dictionary lookup, if no satisfactory match found use an > algorithmic stemmer. Would probably save a few CPU cycles by > algorithmic stemming iff necessary. > > by the way, if you want to do this, you can do it easily in Solr trunk. Just put a StemmerOverrideFilterFactory in front of your stemmer, containing tab-separated dictionary-word stem mappings. In the test-files directory is an example of this (stemdict.txt): # test that we can override the stemming algorithm with our own mappings # these must be tab-separated monkeys monkey otters otter # some crazy ones that a stemmer would never do dogs cat You can use this factory, or the new KeywordMarkerFilterFactory, which is similar but simply takes a text file like protwords.txt, for the stemmer to ignore. Both of these filters set a special attribute for this token in the tokenstream that all stemmers respect, and they won't do any stemming on this token -- Robert Muir rcm...@gmail.com