Re: tf*idf scoring

2009-11-04 Thread Markus Jelsma - Buyways B.V.
Thank you for your explanation On Tue, 2009-11-03 at 07:32 -0800, Grant Ingersoll wrote: > On Nov 3, 2009, at 5:54 AM, Markus Jelsma - Buyways B.V. wrote: > > > > > > I see, but why not return the true values of Lucene? > > I'm not sure what you mean by this. The TVC returns the term > freq

Re: tf*idf scoring

2009-11-03 Thread Grant Ingersoll
On Nov 3, 2009, at 5:54 AM, Markus Jelsma - Buyways B.V. wrote: I see, but why not return the true values of Lucene? I'm not sure what you mean by this. The TVC returns the term frequency and the document frequency and TF/DF as reported by Lucene. The actual raw values. What you are

Re: tf*idf scoring

2009-11-03 Thread Markus Jelsma - Buyways B.V.
> > > > > > According to different algorithms, the tf for term c would be 3 / 1 = > > 0.33 instead of 1 returned by Solr. > > I don't follow. The TF (term frequency) is the number of times the > term c occurs in that particular document, i.e. 1 time. I see that above, and below, i made some

Re: tf*idf scoring

2009-11-03 Thread Grant Ingersoll
Inline below On Nov 3, 2009, at 2:30 AM, Markus Jelsma - Buyways B.V. wrote: Hello list, I have a question about Lucene's calculation of tf*idf value. I first noticed that Solr's tf does not compare to tf values based on calculation elsewhere such as http://odin.himinbi.org/idf_to_item:item/c

tf*idf scoring

2009-11-03 Thread Markus Jelsma - Buyways B.V.
Hello list, I have a question about Lucene's calculation of tf*idf value. I first noticed that Solr's tf does not compare to tf values based on calculation elsewhere such as http://odin.himinbi.org/idf_to_item:item/comparing_tf%3Aidf_to_item% 3Aitem_similarity.xhtml or http://en.wikipedia.org/wik