RE: IDF maxDocs / numDocs

Markus Jelsma Wed, 12 Mar 2014 08:19:44 -0700

Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in 
idfExplain but there's also a docCount(). We use docCount in all our custom 
similarities, also because it allows you to have multiple languages in one 
index where one is much larger than the other. The small language will have 
very high IDF scores using maxDoc but they are proportional enough using 
docCount(). Using docCount() also fixes SolrCloud ranking problems, unless one 
of your replica's becomes inconsistent ;)


https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/CollectionStatistics.html#docCount%28%29

 
 
-----Original message-----
> From:Steven Bower <smb-apa...@alcyon.net>
> Sent: Wednesday 12th March 2014 16:08
> To: solr-user <solr-user@lucene.apache.org>
> Subject: IDF maxDocs / numDocs
> 
> I am noticing the maxDocs between replicas is consistently different and
> that in the idf calculation it is used which causes idf scores for the same
> query/doc between replicas to be different. obviously an optimize can
> normalize the maxDocs scores, but that is only temporary.. is there a way
> to have idf use numDocs instead (as it should be consistent across
> replicas)?
> 
> thanks,
> 
> steve
>

RE: IDF maxDocs / numDocs

Reply via email to