Thanks for the reply Erick, I feared that would be the case. Interesting idea with using the fq but not sure I like the performance implications. I will see how big of a deal it will be in practice, I was just thinking about this as a hypothetical scenario today, and as you said, we have a lot of automated tests so I anticipate this likely causing issues. I'll give it some more thought and see if I can come up with any other workarounds.
-Chris On Tue, Aug 1, 2017 at 5:38 PM, Erick Erickson <erickerick...@gmail.com> wrote: > You're understanding is correct. > > As for how people cope? Mostly they ignore it. The actual number of > times people notice this is usually quite small, mostly it surfaces > when automated test suites are run. > > If you must lock this up, and you can stand the latency you could add > a timestamp for each document and auto-add an FQ clause like: > fq=timestamp:[* TO NOW-soft_commit_interval_plus_some_windage] > > Note, though, that this not an fq clause that can be re-used, see: > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/ so > either it'd be something like: > fq=timestamp:[* TO NOW/MINUTE-soft_commit_interval_plus_some_windage] > or > fq=timestamp:{!cache=false}[* TO NOW-soft_commit_interval_plus_ > some_windage] > > and would inevitably make the latency between when something was > indexed and available for search longer. > > You can also reduce your soft commit interval to something short, but > that has other problems. > > see: SOLR-6606, but it looks like other priorities have gotten in the > way of it being committed. > > Best, > Erick > > On Tue, Aug 1, 2017 at 1:50 PM, Chris Troullis <cptroul...@gmail.com> > wrote: > > Hi, > > > > I think I know the answer to this question, but just wanted to verify/see > > what other people do to address this concern. > > > > I have a Solr Cloud setup (6.6.0) with 2 nodes, 1 collection with 1 shard > > and 2 replicas (1 replica per node). The nature of my use case requires > > frequent updates to Solr, and documents are being added constantly > > throughout the day. I am using CloudSolrClient via SolrJ to query my > > collection and load balance across my 2 replicas. > > > > Here's my question: > > > > As I understand it, because of the nature of Solr Cloud (eventual > > consistency), and the fact that the soft commit timings on the 2 replicas > > will not necessarily be in sync, would it not be possible to run into a > > scenario where, say a document gets indexed on replica 1 right before a > > soft commit, but indexed on replica 2 right after a soft commit? In this > > scenario, using the load balanced CloudSolrClient, wouldn't it be > possible > > for a user to do a search, see the newly added document because they got > > sent to replica 1, and then search again, and the newly added document > > would disappear from their results since they got sent to replica 2 and > the > > soft commit hasn't happened yet? > > > > If so, how do people typically handle this scenario in NRT search cases? > It > > seems like a poor user experience if things keep disappearing and > > reappearing from their search results randomly. Currently the only > thought > > I have to prevent this is to write (or extend) my own solr client to > stick > > a user's session to a specific replica (unless it goes down), but still > > load balance users between the replicas. But of course then I have to > > manage all of the things CloudSolrClient manages manually re: cluster > > state, etc. > > > > Can anyone confirm/deny my understanding of how this works/offer any > > suggestions to eliminate the scenario in question from occurring? > > > > Thanks, > > > > Chris >