The difference comes in the fact that when you query the same form it matches 2 tokens including the less common one. When you query a different form you only match on the more common form. So really you're getting the "boost" from both the tiny difference in TF*IDF and the extra token that you match on.
However, I agree that adding a payload might be a better solution. ----- Original Message ----- > Hi - but this makes no sense, they are scored as equals, except for tiny > differences in TF and IDF. What you would need is something like a stemmer > that preserves the original token and gives a < 1 payload to the stemmed > token. The same goes for filters like decompounders and accent folders that > change meaning of words. > > > -----Original message----- > > From:Diego Fernandez <difer...@redhat.com> > > Sent: Wednesday 17th September 2014 23:37 > > To: solr-user@lucene.apache.org > > Subject: Re: How does KeywordRepeatFilterFactory help giving a higher score > > to an original term vs a stemmed term > > > > I'm not 100% on this, but I imagine this is what happens: > > > > (using -> to mean "tokenized to") > > > > Suppose that you index: > > > > "I am running home" -> "am run running home" > > > > If you then query "running home" -> "run running home" and thus give a > > higher score than if you query "runs home" -> "run runs home" > > > > > > ----- Original Message ----- > > > The Solr wiki says "A repeated question is "how can I have the > > > original term contribute > > > more to the score than the stemmed version"? In Solr 4.3, the > > > KeywordRepeatFilterFactory has been added to assist this > > > functionality. " > > > > > > https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming > > > > > > (Full section reproduced below.) > > > I can see how in the example from the wiki reproduced below that both > > > the stemmed and original term get indexed, but I don't see how the > > > original term gets more weight than the stemmed term. Wouldn't this > > > require a filter that gives terms with the keyword attribute more > > > weight? > > > > > > What am I missing? > > > > > > Tom > > > > > > > > > > > > --------------------------------------------- > > > "A repeated question is "how can I have the original term contribute > > > more to the score than the stemmed version"? In Solr 4.3, the > > > KeywordRepeatFilterFactory has been added to assist this > > > functionality. This filter emits two tokens for each input token, one > > > of them is marked with the Keyword attribute. Stemmers that respect > > > keyword attributes will pass through the token so marked without > > > change. So the effect of this filter would be to index both the > > > original word and the stemmed version. The 4 stemmers listed above all > > > respect the keyword attribute. > > > > > > For terms that are not changed by stemming, this will result in > > > duplicate, identical tokens in the document. This can be alleviated by > > > adding the RemoveDuplicatesTokenFilterFactory. > > > > > > <fieldType name="text_keyword" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.KeywordRepeatFilterFactory"/> > > > <filter class="solr.PorterStemFilterFactory"/> > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > > </analyzer> > > > </fieldType>" > > > > > > > -- > > Diego Fernandez - 爱国 > > Software Engineer > > GSS - Diagnostics > > > > > -- Diego Fernandez - 爱国 Software Engineer GSS - Diagnostics IRC: aiguofer on #gss and #customer-platform