Re: Solr relevancy score different on replicated nodes

Aman Tandon Tue, 12 Feb 2019 10:33:09 -0800

Thanks Erick for your suggestions and time.

On Tue, Feb 12, 2019, 22:32 Erick Erickson <erickerick...@gmail.com wrote:


> You really only have four
> 1> use exactstats. This won't guarantee precise matches, but they'll be
> closer
> 2> optimize (not particularly recommended, but if you're willing to do
> it periodically it'll have the stats match until the next updates).
> 3> use TLOG/PULL replicas and confine the requests to the PULL
> replicas. There'll _still_ be some window for mismatches,
>     specifically the default is commit_interval/2
> 4> define the problem away.
>
> Best,
> Erick
>
> On Tue, Feb 12, 2019 at 2:42 AM Aman Tandon <amantandon...@gmail.com>
> wrote:
> >
> > Hi Erick,
> >
> > Any suggestions on this?
> >
> > Regards,
> > Aman
> >
> > On Fri, Feb 8, 2019, 17:07 Aman Tandon <amantandon...@gmail.com wrote:
> >
> > > Hi Erick,
> > >
> > > I find this thread very relevant to the people who are facing the same
> > > problem.
> > >
> > > In our case, we have a signals aggregation collection which is having
> > > total of around 8 million records. We have Solr cloud architecture(3
> shards
> > > and 4 replicas) and the whole size of index is of around 2.5 GB.
> > >
> > > We use this collection to fetch the most clicked products against a
> query
> > > and boost in search results. Boost score is the query score on
> aggregation
> > > collection.
> > >
> > > But when the query goes to different replica we get different boost
> score
> > > for some of the keywords, hence on page refresh results ordering keep
> on
> > > changing.
> > >
> > > In order to solve we tried the exactstats cache for distributed IDF
> and on
> > > debug level I am seeing global stats merge in logs but still the
> different
> > > scores coming on refreshing the results from aggregation collection.
> > >
> > > Our indexing occur once a day so should we do daily optimization or
> should
> > > we reduce merge segment count to 2/3 currently it is -1.
> > >
> > > What are your suggestions on this?
> > >
> > > Regards,
> > > Aman
> > >
> > > On Fri, Feb 8, 2019, 00:15 Erick Erickson <erickerick...@gmail.com
> wrote:
> > >
> > >> Optimization is safe. The large segment is irrelevant, you'll
> > >> lose a little parallelization, but on an index with this few
> > >> documents I doubt you'll notice.
> > >>
> > >> As of Solr 5, optimize will respect the max segment size
> > >> which defaults to 5G, but you're well under that limit.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Sun, Feb 3, 2019 at 11:54 PM Ashish Bisht <bishtashis...@gmail.com
> >
> > >> wrote:
> > >> >
> > >> > Thanks Erick and everyone.We are checking on stats cache.
> > >> >
> > >> > I noticed stats skew again and optimized the index to correct the
> > >> same.As
> > >> > per the documents.
> > >> >
> > >> >
> > >>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> > >> > and
> > >> >
> > >>
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> > >> >
> > >> > wanted to check on below points considering we want stats skew to be
> > >> > corrected.
> > >> >
> > >> > 1.When optimized single segment won't be natural merged easily.As we
> > >> might
> > >> > be doing manual optimize every time,what I visualize is at a certain
> > >> point
> > >> > in future we might be having a single large segment.What impact this
> > >> large
> > >> > segment is going to have?
> > >> > Our index ~30k documents i.e files with content(Segment size <1Gb
> as of
> > >> now)
> > >> >
> > >> > 1.Do you recommend going for optimize in these situations?Probably
> it
> > >> will
> > >> > be done only when stats skew.Is it safe?
> > >> >
> > >> > Regards
> > >> > Ashish
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Sent from:
> http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> > >>
> > >
>

Re: Solr relevancy score different on replicated nodes

Reply via email to