Re: Inconsistency in results between replicas using CloudSolrClient

Chris Troullis Tue, 01 Aug 2017 15:19:55 -0700

Thanks for the reply Erick, I feared that would be the case. Interesting
idea with using the fq but not sure I like the performance implications. I
will see how big of a deal it will be in practice, I was just thinking
about this as a hypothetical scenario today, and as you said, we have a lot
of automated tests so I anticipate this likely causing issues. I'll give it
some more thought and see if I can come up with any other workarounds.


-Chris

On Tue, Aug 1, 2017 at 5:38 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> You're understanding is correct.
>
> As for how people cope? Mostly they ignore it. The actual number of
> times people notice this is usually quite small, mostly it surfaces
> when automated test suites are run.
>
> If you must lock this up, and you can stand the latency you could add
> a timestamp for each document and auto-add an FQ clause like:
> fq=timestamp:[* TO NOW-soft_commit_interval_plus_some_windage]
>
> Note, though, that this not an fq clause that can be re-used, see:
> https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ so
> either it'd be something like:
> fq=timestamp:[* TO NOW/MINUTE-soft_commit_interval_plus_some_windage]
> or
> fq=timestamp:{!cache=false}[* TO NOW-soft_commit_interval_plus_
> some_windage]
>
> and would inevitably make the latency between when something was
> indexed and available for search longer.
>
> You can also reduce your soft commit interval to something short, but
> that has other problems.
>
> see: SOLR-6606, but it looks like other priorities have gotten in the
> way of it being committed.
>
> Best,
> Erick
>
> On Tue, Aug 1, 2017 at 1:50 PM, Chris Troullis <cptroul...@gmail.com>
> wrote:
> > Hi,
> >
> > I think I know the answer to this question, but just wanted to verify/see
> > what other people do to address this concern.
> >
> > I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard
> > and 2 replicas (1 replica per node). The nature of my use case requires
> > frequent updates to Solr, and documents are being added constantly
> > throughout the day. I am using CloudSolrClient via SolrJ to query my
> > collection and load balance across my 2 replicas.
> >
> > Here's my question:
> >
> > As I understand it, because of the nature of Solr Cloud (eventual
> > consistency), and the fact that the soft commit timings on the 2 replicas
> > will not necessarily be in sync, would it not be possible to run into a
> > scenario where, say a document gets indexed on replica 1 right before a
> > soft commit, but indexed on replica 2 right after a soft commit? In this
> > scenario, using the load balanced CloudSolrClient, wouldn't it be
> possible
> > for a user to do a search, see the newly added document because they got
> > sent to replica 1, and then search again, and the newly added document
> > would disappear from their results since they got sent to replica 2 and
> the
> > soft commit hasn't happened yet?
> >
> > If so, how do people typically handle this scenario in NRT search cases?
> It
> > seems like a poor user experience if things keep disappearing and
> > reappearing from their search results randomly. Currently the only
> thought
> > I have to prevent this is to write (or extend) my own solr client to
> stick
> > a user's session to a specific replica (unless it goes down), but still
> > load balance users between the replicas. But of course then I have to
> > manage all of the things CloudSolrClient manages manually re: cluster
> > state, etc.
> >
> > Can anyone confirm/deny my understanding of how this works/offer any
> > suggestions to eliminate the scenario in question from occurring?
> >
> > Thanks,
> >
> > Chris
>

Re: Inconsistency in results between replicas using CloudSolrClient

Reply via email to