Otis, https://issues.apache.org/jira/browse/LUCENE-2055 may be of some help.
cheers On 7/30/10 2:18 PM, Otis Gospodnetic wrote:
Hello, I'm wondering if anyone has good ideas for handling the following (Porter) stemming problem. The word "city" gets stemmed to "citi". But "citi" is short for "citibank", so we have a conflict - the stems of both "city" and "citi" are "citi", so when you search for "city", you will get matches that are really about citi(bank). Now, we could put "citi" in the "do not stem" list (protwords.txt), but it will be of no use because "citi" is already in the fully stemmed form. This leaves the option of not stemming "cities" or "city" (and perhaps making "city" a synonym for "cities" as a work around) by adding those words to protwords.txt, but this feels like a kluge. Are there more elegant solutions for cases like this one? Thanks, Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/