= fieldLength
You can clearly see the final TF norm being 1, despite the term frequency and
length. Please correct my wrongs :)
Markus
-Original message-
From:Tom Burton-West tburt...@umich.edu
Sent: Thursday 3rd April 2014 20:18
To: solr-user@lucene.apache.org
Subject: Re: tf and very
and
length. Please correct my wrongs :)
Markus
-Original message-
From:Tom Burton-West tburt...@umich.edu
Sent: Thursday 3rd April 2014 20:18
To: solr-user@lucene.apache.org
Subject: Re: tf and very short text fields
Hi Markus and Wunder,
I'm missing the original context, but I
3rd April 2014 20:18
To: solr-user@lucene.apache.org
Subject: Re: tf and very short text fields
Hi Markus and Wunder,
I'm missing the original context, but I don't think BM25 will solve this
particular problem.
The k1 parameter sets how quickly the contribution of tf to the score
On 4/1/14 2:32 PM, Walter Underwood wrote:
And here is another peculiarity of short text fields.
The movie New York, New York should not be twice as relevant for the query new
york. Is there a way to use a binary term frequency rather than a count?
wunder
--
Walter Underwood
On 4/3/14 7:46 AM, Michael Sokolov wrote:
On 4/1/14 2:32 PM, Walter Underwood wrote:
And here is another peculiarity of short text fields.
The movie New York, New York should not be twice as relevant for
the query new york. Is there a way to use a binary term frequency
rather than a count?
Hi Markus and Wunder,
I'm missing the original context, but I don't think BM25 will solve this
particular problem.
The k1 parameter sets how quickly the contribution of tf to the score falls
off with increasing tf. It would be helpful for making sure really long
documents don't get too high a
Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with
k1 set to zero in your schema.
Walter Underwood wun...@wunderwood.org schreef:And here is another
peculiarity of short text fields.
The movie New York, New York should not be twice as relevant for the query
new
Also, if i remember correctly, k1 set to zero for bm25 automatically omits
norms in the calculation. So thats easy to play with without reindexing.
Markus Jelsma markus.jel...@openindex.io schreef:Yes, override
tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero
in
Thanks! We'll try that out and report back. I keep forgetting that I want to
try BM25, so this is a good excuse.
wunder
On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io wrote:
Also, if i remember correctly, k1 set to zero for bm25 automatically omits
norms in the