Re: [Rerank Query] Distributed search + pagination

Alessandro Benedetti Fri, 09 Sep 2016 15:49:51 -0700

Let me explain further,
let's assume a simple case when we have 2 shards.
ReRankDocs =10 , rows=10 .


Correct me if I am wrong Joel,
What we would like :
1 page : top 10 re-scored
2 page: remaining 10 re-scored
>From page 3 the original scored docs.
This is what is happening in a single sol instance if we put reRankDocs to
20.

Let's see with sharing :
To get the first page we get top 10 ( re-scored) from shard1 and top 10
reranked for shard 2.
Then the merged top 10 ( re-scored) will be calculated, and that is the
page 1.

But when we require the page 2, which means we additionally ask now :
20 docs to shard1, 10 re-scored and 10 not.
20 docs to shard2, 10 re-scored and 10 not.
At this point we have 40 docs to merge and rank..
The docs with the original score can go at any position ( not necessarily
the last 20)
In the page 2 we can find potentially docs with the original score.
This is even more likely if the scores are on differente scales (e.g. the
re-scored 0<x<1  and original >100 ) .

Am I right ?
Did I make any wrong assumption so far ?

Cheers


On Fri, Sep 9, 2016 at 7:47 PM, Joel Bernstein <joels...@gmail.com> wrote:

> I'm not understanding where the inconsistency comes into play.
>
> The re-ranking occurs on the shards. The aggregator node will be sent some
> docs that have been re-scored and others that are not. But the sorting
> should be the same as someone pages through the result set.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, Sep 9, 2016 at 9:28 AM, Alessandro Benedetti <
> abenede...@apache.org>
> wrote:
>
> > Hi guys,
> > was just experimenting some reranker with really low number of rerank
> docs
> > ( 10= pageSize) .
> > Let's focus on the distributed enviroment and  the manual sharding
> > approach.
> >
> > Currently what happens is that the reranking task is delivered by the
> > shards, they rescore the docs and then send them back to the aggregator
> > node.
> >
> > If you want to rerank only few docs ( leaving the others with the
> original
> > score following), this can be done in a single Solr instance ( the
> howmany
> > logic manages that in the reranker) .
> >
> > What happens when you move to a distributed environment ?
> > The aggregator will aggregate both rescored and original scored
> documents,
> > making the final ranking inconsistent.
> > In the other hand if we make the rarankingDocs threshold dynamic ( to
> adapt
> > to start+rows) we can incur in the very annoying issue of having a
> document
> > sliding through the pages ( visible in the first page , then appearing
> > again in the third ect ect).
> >
> > Any thought ?
> >
> > Cheers
> >
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [Rerank Query] Distributed search + pagination

Reply via email to