RE: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

Markus Jelsma Wed, 24 Sep 2014 02:38:44 -0700
Hi - but this makes no sense, they are scored as equals, except for tiny 
differences in TF and IDF. What you would need is something like a stemmer that 
preserves the original token and gives a < 1 payload to the stemmed token. The 
same goes for filters like decompounders and accent folders that change meaning 
of words.
 
 
-----Original message-----
> From:Diego Fernandez <difer...@redhat.com>
> Sent: Wednesday 17th September 2014 23:37
> To: solr-user@lucene.apache.org
> Subject: Re: How does KeywordRepeatFilterFactory help giving a higher score 
> to an original term vs a stemmed term
> 
> I'm not 100% on this, but I imagine this is what happens:
> 
> (using -> to mean "tokenized to")
> 
> Suppose that you index:
> 
> "I am running home" -> "am run running home"
> 
> If you then query "running home" -> "run running home" and thus give a higher 
> score than if you query "runs home" -> "run runs home"
> 
> 
> ----- Original Message -----
> > The Solr wiki says   "A repeated question is "how can I have the
> > original term contribute
> > more to the score than the stemmed version"? In Solr 4.3, the
> > KeywordRepeatFilterFactory has been added to assist this
> > functionality. "
> > 
> > https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming
> > 
> > (Full section reproduced below.)
> > I can see how in the example from the wiki reproduced below that both
> > the stemmed and original term get indexed, but I don't see how the
> > original term gets more weight than the stemmed term.  Wouldn't this
> > require a filter that gives terms with the keyword attribute more
> > weight?
> > 
> > What am I missing?
> > 
> > Tom
> > 
> > 
> > 
> > ---------------------------------------------
> > "A repeated question is "how can I have the original term contribute
> > more to the score than the stemmed version"? In Solr 4.3, the
> > KeywordRepeatFilterFactory has been added to assist this
> > functionality. This filter emits two tokens for each input token, one
> > of them is marked with the Keyword attribute. Stemmers that respect
> > keyword attributes will pass through the token so marked without
> > change. So the effect of this filter would be to index both the
> > original word and the stemmed version. The 4 stemmers listed above all
> > respect the keyword attribute.
> > 
> > For terms that are not changed by stemming, this will result in
> > duplicate, identical tokens in the document. This can be alleviated by
> > adding the RemoveDuplicatesTokenFilterFactory.
> > 
> > <fieldType name="text_keyword" class="solr.TextField"
> > positionIncrementGap="100">
> >  <analyzer>
> >    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >    <filter class="solr.KeywordRepeatFilterFactory"/>
> >    <filter class="solr.PorterStemFilterFactory"/>
> >    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >  </analyzer>
> > </fieldType>"
> > 
> 
> -- 
> Diego Fernandez - 爱国
> Software Engineer
> GSS - Diagnostics
> 
>
RE: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

Reply via email to