Hello,

I'm wondering if anyone has good ideas for handling the following (Porter) 
stemming problem.
The word "city" gets stemmed to "citi".  But "citi" is short for "citibank", so 
we have a conflict - the stems of both "city" and "citi" are "citi", so when 
you 

search for "city", you will get matches that are really about citi(bank).

Now, we could put "citi" in the  "do not stem" list (protwords.txt), but it 
will 

be of no use because "citi" is already in the fully stemmed form.  This  leaves 
the option of not stemming "cities" or "city" (and perhaps  making "city" a 
synonym for "cities" as a work around) by adding those words to protwords.txt, 
but this feels like a kluge.

Are there more elegant solutions for cases like this one?

Thanks,
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Reply via email to