idf calculation in Lucene ?

2011-10-20 Thread David Ryan
According to https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/Similarity.html idf(t) = 1 + log ( numDocs/(docFreq+1)) For example, in the following example, ln(26

Re: idf calculation in Lucene ?

2011-10-27 Thread Robert Muir
On Thu, Oct 20, 2011 at 3:11 PM, David Ryan wrote: > > However, in some case,  when I search o'reilly ,  I see > >  *  44.0865 = idf(title: o''reilli=4 o=1488 reilli=14 oreilli=4)* > > In this cae, How is IDF calculated? > thats a phrase or multiphrase query. in this case it sums up the idf of

Re: idf calculation in Lucene ?

2011-10-31 Thread David Ryan
Thanks! Is there any way to extend the Similarity class to overwrite the behavior (e.g., using the max idf instead of the sum of each term idfs)? On Thu, Oct 27, 2011 at 5:41 AM, Robert Muir wrote: > On Thu, Oct 20, 2011 at 3:11 PM, David Ryan wrote: > > > > > However, in some case, when I

Re: idf calculation in Lucene ?

2011-10-31 Thread Robert Muir
yes: override that method idfExplain(java.util.Collection, org.apache.lucene.search.Searcher) On Mon, Oct 31, 2011 at 5:24 PM, David Ryan wrote: > Thanks!  Is there any way to extend the Similarity class to overwrite the > behavior (e.g.,  using the max idf instead of the sum of each term idfs)?

Re: idf calculation in Lucene ?

2011-11-02 Thread David Ryan
one more question, for phrase or multiphrase query. why not using the maximum idf of individual term instead of summing up the idfs of each term? On Mon, Oct 31, 2011 at 2:30 PM, Robert Muir wrote: > yes: override that method idfExplain(java.util.Collection, > org.apache.lucene.search.Search

Re: idf calculation in Lucene ?

2011-11-02 Thread Robert Muir
On Wed, Nov 2, 2011 at 3:09 PM, David Ryan wrote: > one more question, > for  phrase or multiphrase query. > why not using the maximum idf of individual term  instead of summing up the > idfs of each term? because that would be an even worse approximation (but again, if you want to do this, just