Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in idfExplain but there's also a docCount(). We use docCount in all our custom similarities, also because it allows you to have multiple languages in one index where one is much larger than the other. The small language will have very high IDF scores using maxDoc but they are proportional enough using docCount(). Using docCount() also fixes SolrCloud ranking problems, unless one of your replica's becomes inconsistent ;)
https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/CollectionStatistics.html#docCount%28%29 -----Original message----- > From:Steven Bower <smb-apa...@alcyon.net> > Sent: Wednesday 12th March 2014 16:08 > To: solr-user <solr-user@lucene.apache.org> > Subject: IDF maxDocs / numDocs > > I am noticing the maxDocs between replicas is consistently different and > that in the idf calculation it is used which causes idf scores for the same > query/doc between replicas to be different. obviously an optimize can > normalize the maxDocs scores, but that is only temporary.. is there a way > to have idf use numDocs instead (as it should be consistent across > replicas)? > > thanks, > > steve >