Re: Solr relevancy score different on replicated nodes

2019-02-12 Thread Aman Tandon
Thanks Erick for your suggestions and time. On Tue, Feb 12, 2019, 22:32 Erick Erickson You really only have four > 1> use exactstats. This won't guarantee precise matches, but they'll be > closer > 2> optimize (not particularly recommended, but if you're willing to do > it periodically it'll

Re: Solr relevancy score different on replicated nodes

2019-02-12 Thread Erick Erickson
You really only have four 1> use exactstats. This won't guarantee precise matches, but they'll be closer 2> optimize (not particularly recommended, but if you're willing to do it periodically it'll have the stats match until the next updates). 3> use TLOG/PULL replicas and confine the requests to

Re: Solr relevancy score different on replicated nodes

2019-02-12 Thread Aman Tandon
Hi Erick, Any suggestions on this? Regards, Aman On Fri, Feb 8, 2019, 17:07 Aman Tandon Hi Erick, > > I find this thread very relevant to the people who are facing the same > problem. > > In our case, we have a signals aggregation collection which is having > total of around 8 million records.

Re: Solr relevancy score different on replicated nodes

2019-02-08 Thread Aman Tandon
Hi Erick, I find this thread very relevant to the people who are facing the same problem. In our case, we have a signals aggregation collection which is having total of around 8 million records. We have Solr cloud architecture(3 shards and 4 replicas) and the whole size of index is of around 2.5

Re: Solr relevancy score different on replicated nodes

2019-02-07 Thread Erick Erickson
Optimization is safe. The large segment is irrelevant, you'll lose a little parallelization, but on an index with this few documents I doubt you'll notice. As of Solr 5, optimize will respect the max segment size which defaults to 5G, but you're well under that limit. Best, Erick On Sun, Feb 3,

Re: Solr relevancy score different on replicated nodes

2019-02-03 Thread Ashish Bisht
Thanks Erick and everyone.We are checking on stats cache. I noticed stats skew again and optimized the index to correct the same.As per the documents. https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ and

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Walter Underwood
Is this a sharded Solr Cloud collection? If so, you can try using global IDF. That should make the scores more similar on different nodes. https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_ wunder Walter Underwood

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread David Hastings
Maybe instead of using the solr score in your metrics, find a way to use the documents location in the results? you can never trust the score to be consistent, its constantly changing as the indexes changes On Tue, Jan 29, 2019 at 1:29 PM Ashish Bisht wrote: > Hi Erick, > > Our business

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht
Hi Erick, Our business wanted score not to be totally based on default relevancy algo. Instead a mix of solr relevancy+usermetrics(80%+20%). Each result doc is calculated against max score as a fraction of 80.Remaining 20 is from user metrics. Finally sort happens on new score. But say we

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Erick Erickson
No, this is not a bug but a consequence of the design. ExactStats can help, but there is no guarantee that different replicas will compute the exact same score. Scores should be very close however. You haven't explained why you need the scores to match. 99% of the time, worrying about scores at

Re: Solr relevancy score different on replicated nodes

2019-01-29 Thread Ashish Bisht
Hi Erick, To test this scenario I added replica again and from few days have been monitoring metrics like Num Docs, Max Doc, Deleted Docs from *Overview* section of core.Checked *Segments Info* section too.Everything looks in sync. http://:8983/solr/#/MyTestCollection_*shard1_replica_n7*/

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Erick Erickson
What Elizabeth said. Really, this is an intractable problem. Even in the TLOG and PULL replica case, an index getting updates will still fire their replication requests at different wall-clock time. Even if that were coordinated, the vagaries of networks etc. would _still_ mean the various

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Elizabeth Haubert
Hello, To a certain extent, I agree with Eric, that this isn't a problem, but looks like one. The nature of TF*IDF is such that you will see different scores for the same query over time on the same replica, or different replicas for the same query with most replication schemes. This is mildly

Re: Solr relevancy score different on replicated nodes

2019-01-11 Thread Ashish Bisht
Hi Erick, Your statement "*At best, I've seen UIs where they display, say, 1 to 5 stars that are just showing the percentile that the particular doc had _relative to the max score*" is something we are trying to achieve,but we are dealing in percentages rather stars(ratings) Change in MaxScore

Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Erick Erickson
bq. Shouldn't both replica and leader come to same state after this much long period. No. After that long, the docs will be the same, all the docs present on one replica will be present and searchable on the other. However, they will be in different segments so the "stats skew" will remain. But

Re: Solr relevancy score different on replicated nodes

2019-01-08 Thread Ashish Bisht
Thank you Erick for explaining. In my senario, I stopped indexing and updates too and waited for 1 day. Restarted solr too.Shouldn't both replica and leader come to same state after this much long period. As you said this gets corrected by segment merging, hope it is internal process itself and

Re: Solr relevancy score different on replicated nodes

2019-01-07 Thread Erick Erickson
You misunderstand my point. The wall clock times _will_ be different on leader and follower. It follows that the documents contained in the individual segments on the leader and follower will _not_ be identical. This leads to _deleted_ documents being in different segments on the leader and

Re: Solr relevancy score different on replicated nodes

2019-01-06 Thread Ashish Bisht
Hi Erick, Thank you for the details,but doesn't look like a time difference in autocommit caused this issue.As I said if I do retrieve all query/keyword query on both server,they returned correct number of docs,its just relevancy score is taking diff values. I waited for brief period,still

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Erick Erickson
Ashish: Deleting and re-adding a replica is not a solution. Even if you did, that would then be identical only until you started indexing again, then the stats could skew a bit. When you index to NRT replicas, the wall clock times that cause the commits to trigger will be different due to

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Ashish Bisht
Hi Erick, I have updated that I am not facing this problem in a new collection. As per 3) I can try deleting a replica and adding it again, but the confusion is which one out of two should I delete.(wondering which replica is giving correct score for query) Both replicas give same number of

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Mikhail Khludnev
Replicated segments might have different deleted documents by design. Precise numbers can be achieved via exact stats. see https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_ On Fri, Jan 4, 2019 at 2:40 PM AshB wrote: >

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Erick Erickson
See particularly point 3 here and to a lesser extent point 2. https://support.lucidworks.com/s/question/0D5803LRpijCAD/the-number-of-results-returned-is-not-constant-every-time-i-query-solr For point two (the internal Lucene doc IDs are different) you can easily correct it by adding

Solr relevancy score different on replicated nodes

2019-01-04 Thread AshB
Version Solr 7.4.0 zookeeper 3.4.11 Achitecture Two boxes Machine-1,Machine-2 holding single instances of solr We are having a collection which was single shard and single replica i.e s=1 and rf=1 Few days back we tried to add replica to it.But the score for same query is coming different from