Re: Solr relevancy score different on replicated nodes

Erick Erickson Tue, 12 Feb 2019 09:03:15 -0800

You really only have four
1> use exactstats. This won't guarantee precise matches, but they'll be closer
2> optimize (not particularly recommended, but if you're willing to do
it periodically it'll have the stats match until the next updates).
3> use TLOG/PULL replicas and confine the requests to the PULL
replicas. There'll _still_ be some window for mismatches,
    specifically the default is commit_interval/2
4> define the problem away.


Best,
Erick

On Tue, Feb 12, 2019 at 2:42 AM Aman Tandon <amantandon...@gmail.com> wrote:
>
> Hi Erick,
>
> Any suggestions on this?
>
> Regards,
> Aman
>
> On Fri, Feb 8, 2019, 17:07 Aman Tandon <amantandon...@gmail.com wrote:
>
> > Hi Erick,
> >
> > I find this thread very relevant to the people who are facing the same
> > problem.
> >
> > In our case, we have a signals aggregation collection which is having
> > total of around 8 million records. We have Solr cloud architecture(3 shards
> > and 4 replicas) and the whole size of index is of around 2.5 GB.
> >
> > We use this collection to fetch the most clicked products against a query
> > and boost in search results. Boost score is the query score on aggregation
> > collection.
> >
> > But when the query goes to different replica we get different boost score
> > for some of the keywords, hence on page refresh results ordering keep on
> > changing.
> >
> > In order to solve we tried the exactstats cache for distributed IDF and on
> > debug level I am seeing global stats merge in logs but still the different
> > scores coming on refreshing the results from aggregation collection.
> >
> > Our indexing occur once a day so should we do daily optimization or should
> > we reduce merge segment count to 2/3 currently it is -1.
> >
> > What are your suggestions on this?
> >
> > Regards,
> > Aman
> >
> > On Fri, Feb 8, 2019, 00:15 Erick Erickson <erickerick...@gmail.com wrote:
> >
> >> Optimization is safe. The large segment is irrelevant, you'll
> >> lose a little parallelization, but on an index with this few
> >> documents I doubt you'll notice.
> >>
> >> As of Solr 5, optimize will respect the max segment size
> >> which defaults to 5G, but you're well under that limit.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bishtashis...@gmail.com>
> >> wrote:
> >> >
> >> > Thanks Erick and everyone.We are checking on stats cache.
> >> >
> >> > I noticed stats skew again and optimized the index to correct the
> >> same.As
> >> > per the documents.
> >> >
> >> >
> >> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> >> > and
> >> >
> >> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> >> >
> >> > wanted to check on below points considering we want stats skew to be
> >> > corrected.
> >> >
> >> > 1.When optimized single segment won't be natural merged easily.As we
> >> might
> >> > be doing manual optimize every time,what I visualize is at a certain
> >> point
> >> > in future we might be having a single large segment.What impact this
> >> large
> >> > segment is going to have?
> >> > Our index ~30k documents i.e files with content(Segment size <1Gb as of
> >> now)
> >> >
> >> > 1.Do you recommend going for optimize in these situations?Probably it
> >> will
> >> > be done only when stats skew.Is it safe?
> >> >
> >> > Regards
> >> > Ashish
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
> >

Re: Solr relevancy score different on replicated nodes

Reply via email to