Hi - but this makes no sense, they are scored as equals, except for tiny
differences in TF and IDF. What you would need is something like a stemmer that
preserves the original token and gives a < 1 payload to the stemmed token. The
same goes for filters like decompounders and accent folders that change meaning
of words.
-----Original message-----
> From:Diego Fernandez <difer...@redhat.com>
> Sent: Wednesday 17th September 2014 23:37
> To: solr-user@lucene.apache.org
> Subject: Re: How does KeywordRepeatFilterFactory help giving a higher score
> to an original term vs a stemmed term
>
> I'm not 100% on this, but I imagine this is what happens:
>
> (using -> to mean "tokenized to")
>
> Suppose that you index:
>
> "I am running home" -> "am run running home"
>
> If you then query "running home" -> "run running home" and thus give a higher
> score than if you query "runs home" -> "run runs home"
>
>
> ----- Original Message -----
> > The Solr wiki says "A repeated question is "how can I have the
> > original term contribute
> > more to the score than the stemmed version"? In Solr 4.3, the
> > KeywordRepeatFilterFactory has been added to assist this
> > functionality. "
> >
> > https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
> >
> > (Full section reproduced below.)
> > I can see how in the example from the wiki reproduced below that both
> > the stemmed and original term get indexed, but I don't see how the
> > original term gets more weight than the stemmed term. Wouldn't this
> > require a filter that gives terms with the keyword attribute more
> > weight?
> >
> > What am I missing?
> >
> > Tom
> >
> >
> >
> > ---------------------------------------------
> > "A repeated question is "how can I have the original term contribute
> > more to the score than the stemmed version"? In Solr 4.3, the
> > KeywordRepeatFilterFactory has been added to assist this
> > functionality. This filter emits two tokens for each input token, one
> > of them is marked with the Keyword attribute. Stemmers that respect
> > keyword attributes will pass through the token so marked without
> > change. So the effect of this filter would be to index both the
> > original word and the stemmed version. The 4 stemmers listed above all
> > respect the keyword attribute.
> >
> > For terms that are not changed by stemming, this will result in
> > duplicate, identical tokens in the document. This can be alleviated by
> > adding the RemoveDuplicatesTokenFilterFactory.
> >
> > <fieldType name="text_keyword" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer>
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.KeywordRepeatFilterFactory"/>
> > <filter class="solr.PorterStemFilterFactory"/>
> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > </analyzer>
> > </fieldType>"
> >
>
> --
> Diego Fernandez - 爱国
> Software Engineer
> GSS - Diagnostics
>
>