Re: Solr relevancy score different on replicated nodes

Erick Erickson Fri, 11 Jan 2019 08:08:40 -0800

What Elizabeth said.

Really, this is an intractable problem. Even in the TLOG
and PULL replica case, an index getting updates will
still fire their replication requests at different wall-clock
time. Even if that were coordinated, the vagaries of
networks etc. would _still_ mean the various replicas
would see slightly different "snapshots" of the index.
True, the window would be smaller....


The only situations I've seen where the scores on different
replicas are always identical is when the index is optimized,
which isn't recommended except if you can do it
all the time. Or TLOG and PULL replicas are used and
the index is not undergoing continuous updates.

As for locking subsequent requests to a set of nodes, the
idea has been bandied about but usually falls down when
it's realized that this has the potential to unevenly distribute
the load.

Best,
Erick

On Fri, Jan 11, 2019 at 3:13 AM Elizabeth Haubert
<ehaub...@opensourceconnections.com> wrote:
>
> Hello,
>
> To a certain extent, I agree with Eric, that this isn't a problem, but
> looks like one.  The nature of TF*IDF is such that you will see different
> scores for the same query over time on the same replica, or different
> replicas for the same query with most replication schemes. This is mildly
> annoying when the score is displayed to the user, although I have found
> most end users do not pay that much attention to the floating point score.
> Testers do.  On a small index with high write/delete traffic and homogenous
> docs, I've seen it cause document re-orderings when the same query is
> repeated and sent to different replicas such as for paging, and that is
> noticeable to end users.
>
> How big is your index, and how different are the percentages you are
> seeing?  This is a much more pronounced problem on smaller indices; it is
> possible this is a problem with your test setup, but not production.
>
> Your solution at directing users to a consistent replica will solve the
> change in values over a session-sized window of time.   With a single
> shard, you could use a Master/Slave setup, direct queries at a given
> slave.  This has a number of operational consequences though, as it means
> you will lose the benefits of SolrCloud.
>
> Mikhail's suggestion to use ExactStats would be cleaner:
> https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_
>
>
> Elizabeth
>
> On Fri, Jan 11, 2019 at 3:56 AM Ashish Bisht <bishtashis...@gmail.com>
> wrote:
>
> > Hi Erick,
> >
> > Your statement "*At best, I've seen UIs where they display, say, 1 to 5
> > stars that are just showing the percentile that the particular doc had
> > _relative to the max score*"  is something we are trying to achieve,but we
> > are dealing in percentages rather stars(ratings)
> >
> > Change in MaxScore per node is messing it.
> >
> > I was thinking if it possible to make one complete request(for a term) go
> > though one replica,i.e if to the client we could tell which replica hit the
> > first request and subsequently further paginated requests should go though
> > that replica until keyword is changed.Do you think it is possible or a good
> > idea?If yes is there a way in solr to know which replica served request?
> >
> > Regards
> > Ashish
> >
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >

Re: Solr relevancy score different on replicated nodes

Reply via email to