On Wed, Apr 21, 2010 at 1:38 PM, Shashi Kant <sk...@sloan.mit.edu> wrote:

> Why do these approaches have to be mutually exclusive?
> Do a dictionary lookup, if no satisfactory match found use an
> algorithmic stemmer. Would probably save a few CPU cycles by
> algorithmic stemming iff necessary.
>
>
by the way, if you want to do this, you can do it easily in Solr trunk. Just
put a StemmerOverrideFilterFactory in front of your stemmer, containing
tab-separated dictionary-word stem mappings. In the test-files directory is
an example of this (stemdict.txt):

# test that we can override the stemming algorithm with our own mappings
# these must be tab-separated
monkeys    monkey
otters    otter
# some crazy ones that a stemmer would never do
dogs    cat

You can use this factory, or the new KeywordMarkerFilterFactory, which is
similar but simply takes a text file like protwords.txt, for the stemmer to
ignore.
Both of these filters set a special attribute for this token in the
tokenstream that all stemmers respect, and they won't do any stemming on
this token

-- 
Robert Muir
rcm...@gmail.com

Reply via email to