RE: tf and very short text fields

2014-04-04 Thread Markus Jelsma
= fieldLength You can clearly see the final TF norm being 1, despite the term frequency and length. Please correct my wrongs :) Markus -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Thursday 3rd April 2014 20:18 To: solr-user@lucene.apache.org Subject: Re: tf and very

Re: tf and very short text fields

2014-04-04 Thread Ahmet Arslan
and length. Please correct my wrongs :) Markus -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Thursday 3rd April 2014 20:18 To: solr-user@lucene.apache.org Subject: Re: tf and very short text fields Hi Markus and Wunder, I'm  missing the original context, but I

Re: tf and very short text fields

2014-04-04 Thread Tom Burton-West
3rd April 2014 20:18 To: solr-user@lucene.apache.org Subject: Re: tf and very short text fields Hi Markus and Wunder, I'm missing the original context, but I don't think BM25 will solve this particular problem. The k1 parameter sets how quickly the contribution of tf to the score

Re: tf and very short text fields

2014-04-03 Thread Michael Sokolov
On 4/1/14 2:32 PM, Walter Underwood wrote: And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new york. Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood

Re: tf and very short text fields

2014-04-03 Thread Michael Sokolov
On 4/3/14 7:46 AM, Michael Sokolov wrote: On 4/1/14 2:32 PM, Walter Underwood wrote: And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new york. Is there a way to use a binary term frequency rather than a count?

Re: tf and very short text fields

2014-04-03 Thread Tom Burton-West
Hi Markus and Wunder, I'm missing the original context, but I don't think BM25 will solve this particular problem. The k1 parameter sets how quickly the contribution of tf to the score falls off with increasing tf. It would be helpful for making sure really long documents don't get too high a

Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Underwood wun...@wunderwood.org schreef:And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new

Re: Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma markus.jel...@openindex.io schreef:Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in

Re: tf and very short text fields

2014-04-01 Thread Walter Underwood
Thanks! We'll try that out and report back. I keep forgetting that I want to try BM25, so this is a good excuse. wunder On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io wrote: Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the