Let me explain further, let's assume a simple case when we have 2 shards. ReRankDocs =10 , rows=10 .
Correct me if I am wrong Joel, What we would like : 1 page : top 10 re-scored 2 page: remaining 10 re-scored >From page 3 the original scored docs. This is what is happening in a single sol instance if we put reRankDocs to 20. Let's see with sharing : To get the first page we get top 10 ( re-scored) from shard1 and top 10 reranked for shard 2. Then the merged top 10 ( re-scored) will be calculated, and that is the page 1. But when we require the page 2, which means we additionally ask now : 20 docs to shard1, 10 re-scored and 10 not. 20 docs to shard2, 10 re-scored and 10 not. At this point we have 40 docs to merge and rank.. The docs with the original score can go at any position ( not necessarily the last 20) In the page 2 we can find potentially docs with the original score. This is even more likely if the scores are on differente scales (e.g. the re-scored 0<x<1 and original >100 ) . Am I right ? Did I make any wrong assumption so far ? Cheers On Fri, Sep 9, 2016 at 7:47 PM, Joel Bernstein <joels...@gmail.com> wrote: > I'm not understanding where the inconsistency comes into play. > > The re-ranking occurs on the shards. The aggregator node will be sent some > docs that have been re-scored and others that are not. But the sorting > should be the same as someone pages through the result set. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Sep 9, 2016 at 9:28 AM, Alessandro Benedetti < > abenede...@apache.org> > wrote: > > > Hi guys, > > was just experimenting some reranker with really low number of rerank > docs > > ( 10= pageSize) . > > Let's focus on the distributed enviroment and the manual sharding > > approach. > > > > Currently what happens is that the reranking task is delivered by the > > shards, they rescore the docs and then send them back to the aggregator > > node. > > > > If you want to rerank only few docs ( leaving the others with the > original > > score following), this can be done in a single Solr instance ( the > howmany > > logic manages that in the reranker) . > > > > What happens when you move to a distributed environment ? > > The aggregator will aggregate both rescored and original scored > documents, > > making the final ranking inconsistent. > > In the other hand if we make the rarankingDocs threshold dynamic ( to > adapt > > to start+rows) we can incur in the very annoying issue of having a > document > > sliding through the pages ( visible in the first page , then appearing > > again in the third ect ect). > > > > Any thought ? > > > > Cheers > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England