Re: tf and very short text fields

2014-04-04 Thread Tom Burton-West
gt; and length. Please correct my wrongs :) > Markus > > > > -Original message- > > From:Tom Burton-West > > Sent: Thursday 3rd April 2014 20:18 > > To: solr-user@lucene.apache.org > > Subject: Re: tf and very short text fields > > > >

Re: tf and very short text fields

2014-04-04 Thread Ahmet Arslan
lease correct my wrongs :) Markus -Original message- > From:Tom Burton-West > Sent: Thursday 3rd April 2014 20:18 > To: solr-user@lucene.apache.org > Subject: Re: tf and very short text fields > > Hi Markus and Wunder, > > I'm  missing the original conte

RE: tf and very short text fields

2014-04-04 Thread Markus Jelsma
16.0 = fieldLength You can clearly see the final TF norm being 1, despite the term frequency and length. Please correct my wrongs :) Markus -Original message- > From:Tom Burton-West > Sent: Thursday 3rd April 2014 20:18 > To: solr-user@lucene.apache.org > Subject: Re: tf and ve

Re: tf and very short text fields

2014-04-03 Thread Tom Burton-West
Hi Markus and Wunder, I'm missing the original context, but I don't think BM25 will solve this particular problem. The k1 parameter sets how quickly the contribution of tf to the score falls off with increasing tf. It would be helpful for making sure really long documents don't get too high a

Re: tf and very short text fields

2014-04-03 Thread Michael Sokolov
On 4/3/14 7:46 AM, Michael Sokolov wrote: On 4/1/14 2:32 PM, Walter Underwood wrote: And here is another peculiarity of short text fields. The movie "New York, New York" should not be twice as relevant for the query "new york". Is there a way to use a binary term frequency rather than a count

Re: tf and very short text fields

2014-04-03 Thread Michael Sokolov
On 4/1/14 2:32 PM, Walter Underwood wrote: And here is another peculiarity of short text fields. The movie "New York, New York" should not be twice as relevant for the query "new york". Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood wun...@wunderw

Re: tf and very short text fields

2014-04-01 Thread Walter Underwood
Thanks! We'll try that out and report back. I keep forgetting that I want to try BM25, so this is a good excuse. wunder On Apr 1, 2014, at 12:30 PM, Markus Jelsma wrote: > Also, if i remember correctly, k1 set to zero for bm25 automatically omits > norms in the calculation. So thats easy to p

Re: Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma schreef:Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Under

Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Underwood schreef:And here is another peculiarity of short text fields. The movie "New York, New York" should not be twice as relevant for the query "new york". Is there a way

tf and very short text fields

2014-04-01 Thread Walter Underwood
And here is another peculiarity of short text fields. The movie "New York, New York" should not be twice as relevant for the query "new york". Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood wun...@wunderwood.org