Otis,

https://issues.apache.org/jira/browse/LUCENE-2055 may be of some help.

cheers

On 7/30/10 2:18 PM, Otis Gospodnetic wrote:
Hello,

I'm wondering if anyone has good ideas for handling the following (Porter)
stemming problem.
The word "city" gets stemmed to "citi".  But "citi" is short for "citibank", so
we have a conflict - the stems of both "city" and "citi" are "citi", so when you

search for "city", you will get matches that are really about citi(bank).

Now, we could put "citi" in the  "do not stem" list (protwords.txt), but it will

be of no use because "citi" is already in the fully stemmed form.  This  leaves
the option of not stemming "cities" or "city" (and perhaps  making "city" a
synonym for "cities" as a work around) by adding those words to protwords.txt,
but this feels like a kluge.

Are there more elegant solutions for cases like this one?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


Reply via email to