Re: Distributed IDF in Alias

2019-05-18 Thread Erick Erickson
In a word, “yes”. For time routed alias, you also have to be aware of the nature of your data. Take the canonical example of news stories for instance, and let’s assume that every day a new collection is created. Now a hot news story breaks and the news is flooded with the latest story, “Hurric

Re: Distributed IDF in Alias

2019-05-18 Thread Andrzej Białecki
Yes, the IDFs will be different. You could probably implement a custom component that would take term statistics from the previous collections to pre-populate the stats of the current collection, but this is an uncharted area, there’s a lot that could go wrong. Eg. if there’s a genuine shift in

Re: Distributed IDF in Alias

2019-05-17 Thread SOLR4189
I ask my question due to I want to use TRA (Time Routed Aliases). Let's say SOLR will open new collection every month. In the beginning of month a new collection will be empty almost. So IDF will be different between new collection and collection of previous month? -- Sent from: http://lucene.

Re: Distributed IDF in Alias

2019-05-17 Thread Andrzej Białecki
Both descriptions are correct, but in their context. The description in the Ref Guide in the section about ExactStatsCache is correct in the sense that it uses collection-wide IDF values for terms when calculating scores for different SHARDS (and merging partial per-shard lists). This means that

Distributed IDF in Alias

2019-05-17 Thread SOLR4189
Hi all, Can somebody explain me SOLR tip from here : /"Any alias (standard or routed) that references multiple collections may complicate relevancy. By default, SolrCloud scores documents on a per s