Both descriptions are correct, but in their context. The description in the Ref 
Guide in the section about ExactStatsCache is correct in the sense that it uses 
collection-wide IDF values for terms when calculating scores for different 
SHARDS (and merging partial per-shard lists). This means that even if local IDF 
(for documents in a particular shard) is biased the scores will be still 
comparable across shards and the documents coming from these partial lists can 
be merged using their absolute scores - and their rank (ordering) will be the 
same as if they all came from one big shard..

There’s no such mechanism for adjusting scores across two or more different 
COLLECTIONS. Usually IDFs for the same terms will be different in different 
collections - which means the absolute values of scores for the same terms 
won’t be comparable. Still, if you insist and you use a multi-collection alias 
Solr will obey ;) and it will merge these partial lists as if their scores were 
comparable. The end result will be that some or most of the results will be 
incorrectly ranked, depending on how different were the IDFs in these 
collections.

> On 17 May 2019, at 16:37, SOLR4189 <klin892...@yandex.ru> wrote:
> 
> Hi all,
> 
> Can somebody explain me SOLR tip from  here
> <https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/aliases.html>
>  
> :
> /"Any alias (standard or routed) that references multiple collections may
> complicate relevancy. By default, SolrCloud scores documents on a per shard
> basis. With multiple collections in an alias this is always a problem, so if
> you have a use case for which BM25 or TF/IDF relevancy is important you will
> want to turn on one of the ExactStatsCache implementations"/
> 
> But there is / "This implementation uses global values (across the
> collection) for document frequency" / in ExactStatsCache documentation (from 
> here
> <https://builds.apache.org/view/L/view/Lucene/job/Solr-reference-guide-8.x/javadoc/distributed-requests.html#distributedidf>
>  
> )
> 
> So what does it mean "across the collection"? Does it mean that distributed
> IDF is inside the same collection (across shards)? If yes, how it will help
> in the alias case?
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 

Reply via email to