Re: Omitting tf but not positions

2011-02-25 Thread Robert Muir
On Fri, Feb 25, 2011 at 1:57 PM, Jan Høydahl wrote: > I also have a case (yellow-page) where IDF comes in and destroys the rank. > A company listing with a word which occurs in few other listings is not > necessarily better than others just because of that. When it gets to the > extreme value of

Re: Omitting tf but not positions

2011-02-25 Thread Robert Zotter
Jan, You are correct, you'll need your own Similarity class. Have a look at SweetSpotSimilarity (http://lucene.apache.org/java/3_0_3/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html) On 2/25/11 10:57 AM, Jan Høydahl wrote: I also have a case (yellow-page) where IDF comes in a

Re: Omitting tf but not positions

2011-02-25 Thread Jan Høydahl
I also have a case (yellow-page) where IDF comes in and destroys the rank. A company listing with a word which occurs in few other listings is not necessarily better than others just because of that. When it gets to the extreme value of IDF=1, we get an artificially high IDF boost. It is not kil

Re: Omitting tf but not positions

2010-12-15 Thread Robert Muir
On Wed, Dec 15, 2010 at 3:09 AM, Jan Høydahl / Cominvent wrote: > Any way to disable TF/IDF normalization without also disabling positions? > see Similarity.tf(float) and Similarity.tf(int) if you want to change this for both terms and phrases just override Similarity.tf(float), since by default

Omitting tf but not positions

2010-12-15 Thread Jan Høydahl / Cominvent
Hi, I have a case where I use DisMax "pf" to boost on phrase match in a field. I use omitNorms=true to avoid length normalization to mess with my scores. However, for some documents, the phrase "foo bar" occur more than one time in the same field, and I get an unintended TF boost for one of the