Yes, this is the intended behavior. All of the Solr stemmers are based on
heuristics that are not perfect, and are not based on the real dictionary.
You can solve one problem by switching to another stemmer, but then you run
into a different problem, rinse and repeat.

The code has a specific rule that refrains from stemming a pattern that
also happens to match your specified cases:

        if (s[len-3] == 'i' || s[len-3] == 'a' || s[len-3] == 'o' ||
s[len-3] == 'e')
          return len;

See:
https://github.com/apache/lucene-solr/blob/branch_3x/lucene/contrib/analyzers/common/src/java/org/apache/lucene/analysis/en/EnglishMinimalStemmer.java

So, xxxies, xxxaes, xxxoes, and xxxees will all remain unstemmed. Exactly
what the rationale for that rule was is unspecified in the code - no
comments, other than to point to this research document:
https://www.researchgate.net/publication/220433848_How_effective_is_suffixing



-- Jack Krupansky

On Thu, Apr 14, 2016 at 1:17 PM, Sara Woodmansee <swood...@gmail.com> wrote:

> Hello all,
>
> I posted yesterday, however I never received my own post, so worried it
> did not go through (?) Also, I am not a coder, so apologies if not
> appropriate to post here. I honestly don't know where else to turn, and am
> determined to find a solution, as search is essential to our site.
>
> We are having a website built with a search engine based on SOLR v3.6. For
> stemming, the developer uses EnglishMinimalStemFilterFactory. They were
> previously using PorterStemFilterFactory which worked better with plural
> forms, however PorterStemFilterFactory was not working correctly with –ing
> endings. “icing” becoming "ic", for example.
>
> Most search terms work fine, but we have inconsistent results (singular vs
> plural) with terms that end in -ee, -oe, -ie, -ae,  and words that end in
> -s.  In comparison, the following work fine: words that end with -oo, -ue,
> -e, -a.
>
> The developers have been unable to find a solution ("Unfortunately we
> tried to apply all the filters for stemming but this problem is not
> resolved"), but this has to be a common issue (?) Someone surely has found
> a solution to this problem??
>
> Any suggestions greatly appreciated.
>
> Many thanks!
> Sara
> _____________________
>
> DO NOT WORK:  Plural terms that end in -ee, -oe, -ie, -ae,  and words that
> end in -s.
>
> Examples:
>
> tree = 0 results
> trees = 21 results
>
> dungaree = 0 results
> dungarees = 1 result
>
> shoe = 0 results
> shoes = 1 result
>
> toe = 1 result
> toes = 0 results
>
> tie = 1 result
> ties = 0 results
>
> Cree = 0 results
> Crees = 1 result
>
> dais = 1 result
> daises = 0 results
>
> bias = 1 result
> biases = 0 results
>
> dress = 1 result
> dresses = 0 results
> _____________________
>
> WORKS:  Words that end with -oo, -ue, -e, -a
>
> Examples:
>
> tide = 1 result
> tides = 1 results
>
> hue = 2 results
> hues = 2 results
>
> dakota = 1 result
> dakotas = 1 result
>
> loo = 1 result
> loos = 1 result
> _____________________
>
>

Reply via email to