Re: SolrCloud replicas consistently out of sync

Aleksey Mezhva Thu, 19 May 2016 09:12:04 -0700

Bump.

this thread is with someone having a similar issue:


https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3c09fdab82-7600-49e0-b639-9cb9db937...@yahoo.com%3E

It seems like this is not really fixed in 5.4/6.0?


Aleksey

From: Steve Weiss <steve.we...@wgsn.com>
Date: Tuesday, May 17, 2016 at 7:25 PM
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Cc: Aleksey Mezhva <aleksey.mez...@wgsn.com>, Hans Zhou <hans.z...@wgsn.com>
Subject: Re: SolrCloud replicas consistently out of sync

Gotcha - well that's nice.  Still, we seem to be permanently out of sync.

I see this thread with someone having a similar issue:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201601.mbox/%3c09fdab82-7600-49e0-b639-9cb9db937...@yahoo.com%3E

It seems like this is not really fixed in 5.4/6.0?  Is there any version of 
SolrCloud where this wasn't yet a problem that we could downgrade to?

--
Steve

On Tue, May 17, 2016 at 6:23 PM, Markus Jelsma 
<markus.jel...@openindex.io<mailto:markus.jel...@openindex.io>> wrote:
Hi, thats a known issue and unrelated:
https://issues.apache.org/jira/browse/SOLR-9120

M.


-----Original message-----
> From:Stephen Weiss <steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>
> Sent: Tuesday 17th May 2016 23:10
> To: solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>; Aleksey 
> Mezhva <aleksey.mez...@wgsn.com<mailto:aleksey.mez...@wgsn.com>>; Hans Zhou 
> <hans.z...@wgsn.com<mailto:hans.z...@wgsn.com>>
> Subject: Re: SolrCloud replicas consistently out of sync
>
> I should add - looking back through the logs, we're seeing frequent errors 
> like this now:
>
> 78819692 WARN  (qtp110456297-1145) [   ] o.a.s.h.a.LukeRequestHandler Error 
> getting file length for [segments_4o]
> java.nio.file.NoSuchFileException: 
> /var/solr/data/instock_shard5_replica1/data/index.20160516230059221/segments_4o
>
> --
> Steve
>
>
> On Tue, May 17, 2016 at 5:07 PM, Stephen Weiss 
> <steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>>
>  wrote:
> OK, so we did as you suggest, read through that article, and we reconfigured 
> the autocommit to:
>
> <autoCommit>
> <maxTime>${solr.autoCommit.maxTime:30000}</maxTime>
> <openSearcher>false</openSearcher>
> </autoCommit>
>
> <autoSoftCommit>
> <maxTime>${solr.autoSoftCommit.maxTime:600000}</maxTime>
> </autoSoftCommit>
>
> However, we see no change, aside from the fact that it's clearly committing 
> more frequently.  I will say on our end, we clearly misunderstood the 
> difference between soft and hard commit, but even now having it configured 
> this way, we are still totally out of sync, long after all indexing has 
> completed (it's been about 30 minutes now).  We manually pushed through a 
> commit on the whole collection as suggested, however, all we get back for 
> that is o.a.s.u.DirectUpdateHandler2 No uncommitted changes. Skipping 
> IW.commit., which makes sense, because it was all committed already anyway.
>
> We still currently have all shards mismatched:
>
> instock_shard1   replica 1: 30788491 replica 2: 30778865
> instock_shard10   replica 1: 30973059 replica 2: 30971874
> instock_shard11   replica 2: 31036815 replica 1: 31034715
> instock_shard12   replica 2: 30177084 replica 1: 30170511
> instock_shard13   replica 2: 30608225 replica 1: 30603923
> instock_shard14   replica 2: 30755739 replica 1: 30753191
> instock_shard15   replica 2: 30891713 replica 1: 30891528
> instock_shard16   replica 1: 30818567 replica 2: 30817152
> instock_shard17   replica 1: 30423877 replica 2: 30422742
> instock_shard18   replica 2: 30874979 replica 1: 30872223
> instock_shard19   replica 2: 30917208 replica 1: 30909999
> instock_shard2   replica 1: 31062339 replica 2: 31060575
> instock_shard20   replica 1: 30192046 replica 2: 30190893
> instock_shard21   replica 2: 30793817 replica 1: 30791135
> instock_shard22   replica 2: 30821521 replica 1: 30818836
> instock_shard23   replica 2: 30553773 replica 1: 30547336
> instock_shard24   replica 1: 30975564 replica 2: 30971170
> instock_shard25   replica 1: 30734696 replica 2: 30731682
> instock_shard26   replica 1: 31465696 replica 2: 31464738
> instock_shard27   replica 1: 30844884 replica 2: 30842445
> instock_shard28   replica 2: 30549826 replica 1: 30547405
> instock_shard29   replica 2: 30637777 replica 1: 30634091
> instock_shard3   replica 1: 30930723 replica 2: 30926483
> instock_shard30   replica 2: 30904528 replica 1: 30902649
> instock_shard31   replica 2: 31175813 replica 1: 31174921
> instock_shard32   replica 2: 30932837 replica 1: 30926456
> instock_shard4   replica 2: 30758100 replica 1: 30754129
> instock_shard5   replica 2: 31008893 replica 1: 31002581
> instock_shard6   replica 2: 31008679 replica 1: 31005380
> instock_shard7   replica 2: 30738468 replica 1: 30737795
> instock_shard8   replica 2: 30620929 replica 1: 30616715
> instock_shard9   replica 1: 31071386 replica 2: 31066956
>
> The fact that the min_rf numbers aren't coming back as 2 seems to indicate to 
> me that documents simply aren't making it to both replicas - why would that 
> have anything to do with committing anyway?
>
> Something else is amiss here.  Too bad, committing sounded like an easy 
> answer!
>
> --
> Steve
>
>
> On Tue, May 17, 2016 at 11:39 AM, Erick Erickson 
> <erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>>>
>  wrote:
> OK, these autocommit settings need revisiting.
>
> First off, I'd remove the maxDocs entirely although with the setting
> you're using it probably doesn't matter.
>
> The maxTime of 1,200,000 is 20 minutes. Which means if you evern
> un-gracefully kill your shards you'll have up to 20 minutes worth of
> data to replay from the tlog.... or resynch from the leader. Make this
> much shorter (60000 or less) and be sure to gracefully kill your Solrs.
> no "kill -9" for intance....
>
> To be sure, before you bounce servers try either waiting 20 minutes
> after the indexing stops or issue a manual commit before shutting
> down your servers with
> http://..../solr/collection/update?commit=true
>
> I have a personal annoyance with the bin/solr script where it forcefully
> (ungracefully) kills Solr after 5 seconds. I think this is much too short
> so you might consider making it longer in prod, it's a shell script so
> it's easy.
>
> <autoCommit>
> <maxTime>${solr.autoCommit.maxTime:1200000}</maxTime>
> <maxDocs>${solr.autoCommit.maxDocs:1000000000}</maxDocs>
> <openSearcher>false</openSearcher>
> </autoCommit>
>
>
> this is probably the  crux of "shards being out of sync". They're _not_
> out of sync, it's just that some of them have docs visible to searches
> and some do not since the wall-clock time these are triggered are
> _not_ the same. So you have a 10 minute window where two or more
> replicas for a single shard are out-of-sync.
>
>
> <autoSoftCommit>
> <maxTime>${solr.autoSoftCommit.maxTime:600000}</maxTime>
> </autoSoftCommit>
>
> You can test all this one of two ways:
> 1> if you have a timestamp when the docs were indexed, do all
> the shards match if you do a query like
> q=*:*&timestamp:[* TO NOW/-15MINUTES]?
> or, if indexing is _not_ occurring, issue a manual commit like
> .../solr/collection/update?commit=true
> and see if all the replicas match for each shard.
>
> Here's a long blog on commits:
> https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Tue, May 17, 2016 at 8:18 AM, Stephen Weiss 
> <steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>>
>  wrote:
> > Yes, after startup there was a recovery process, you are right.  It's just 
> > that this process doesn't seem to happen unless we do a full restart.
> >
> > These are our autocommit settings - to be honest, we did not really use 
> > autocommit until we switched up to SolrCloud so it's totally possible they 
> > are not very good settings.  We wanted to minimize the frequency of commits 
> > because the commits seem to create a performance drag during indexing.   
> > Perhaps it's gone overboard?
> >
> > <autoCommit>
> > <maxTime>${solr.autoCommit.maxTime:1200000}</maxTime>
> > <maxDocs>${solr.autoCommit.maxDocs:1000000000}</maxDocs>
> > <openSearcher>false</openSearcher>
> > </autoCommit>
> > <autoSoftCommit>
> > <maxTime>${solr.autoSoftCommit.maxTime:600000}</maxTime>
> > </autoSoftCommit>
> >
> > By nodes, I am indeed referring to machines.  There are 8 shards per 
> > machine (2 replicas of each), all in one JVM a piece.  We haven't specified 
> > any specific timestamps for the logs - they are just whatever happens by 
> > default.
> >
> > --
> > Steve
> >
> > On Mon, May 16, 2016 at 11:50 PM, Erick Erickson 
> > <erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>>>>
> >  wrote:
> > OK, this is very strange. There's no _good_ reason that
> > restarting the servers should make a difference. The fact
> > that it took 1/2 hour leads me to believe, though, that your
> > shards are somehow "incomplete", especially that you
> > are indexing to the system and don't have, say,
> > your autocommit settings done very well. The long startup
> > implies (guessing) that you have pretty big tlogs that
> > are replayed upon startup. While these were coming up,
> > did you see any of the shards in the "recovering" state? That's
> > the only way I can imagine that Solr "healed" itself.
> >
> > I've got to point back to the Solr logs. Are they showing
> > any anomalies? Are any nodes in recovery when you restart?
> >
> > Best,
> > Erick
> >
> >
> >
> > On Mon, May 16, 2016 at 4:14 PM, Stephen Weiss 
> > <steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>>>
> >  wrote:
> >> Just one more note - while experimenting, I found that if I stopped all 
> >> nodes (full cluster shutdown), and then startup all nodes, they do in fact 
> >> seem to repair themselves then.  We have a script to monitor the 
> >> differences between replicas (just looking at numDocs) and before the full 
> >> shutdown / restart, we had:
> >>
> >> wks53104:Downloads sweiss$ php testReplication.php
> >> Found 32 mismatched shard counts.
> >> instock_shard1   replica 1: 30785553 replica 2: 30777568
> >> instock_shard10   replica 1: 30972662 replica 2: 30966215
> >> instock_shard11   replica 2: 31036718 replica 1: 31033547
> >> instock_shard12   replica 1: 30179823 replica 2: 30176067
> >> instock_shard13   replica 2: 30604638 replica 1: 30599219
> >> instock_shard14   replica 2: 30755117 replica 1: 30753469
> >> instock_shard15   replica 2: 30891325 replica 1: 30888771
> >> instock_shard16   replica 1: 30818260 replica 2: 30811728
> >> instock_shard17   replica 1: 30422080 replica 2: 30414666
> >> instock_shard18   replica 2: 30874530 replica 1: 30869977
> >> instock_shard19   replica 2: 30917008 replica 1: 30913715
> >> instock_shard2   replica 1: 31062073 replica 2: 31057583
> >> instock_shard20   replica 1: 30188774 replica 2: 30186565
> >> instock_shard21   replica 2: 30789012 replica 1: 30784160
> >> instock_shard22   replica 2: 30820473 replica 1: 30814822
> >> instock_shard23   replica 2: 30552105 replica 1: 30545802
> >> instock_shard24   replica 1: 30973906 replica 2: 30971314
> >> instock_shard25   replica 1: 30732287 replica 2: 30724988
> >> instock_shard26   replica 1: 31465543 replica 2: 31463414
> >> instock_shard27   replica 2: 30845514 replica 1: 30842665
> >> instock_shard28   replica 2: 30549151 replica 1: 30543070
> >> instock_shard29   replica 2: 30635711 replica 1: 30629240
> >> instock_shard3   replica 1: 30930400 replica 2: 30928438
> >> instock_shard30   replica 2: 30902221 replica 1: 30895176
> >> instock_shard31   replica 2: 31174246 replica 1: 31169998
> >> instock_shard32   replica 2: 30931550 replica 1: 30926256
> >> instock_shard4   replica 2: 30755525 replica 1: 30748922
> >> instock_shard5   replica 2: 31006601 replica 1: 30994316
> >> instock_shard6   replica 2: 31006531 replica 1: 31003444
> >> instock_shard7   replica 2: 30737098 replica 1: 30727509
> >> instock_shard8   replica 2: 30619869 replica 1: 30609084
> >> instock_shard9   replica 1: 31067833 replica 2: 31061238
> >>
> >>
> >> This stayed consistent for several hours.
> >>
> >> After restart:
> >>
> >> wks53104:Downloads sweiss$ php testReplication.php
> >> Found 3 mismatched shard counts.
> >> instock_shard19   replica 2: 30917008 replica 1: 30913715
> >> instock_shard22   replica 2: 30820473 replica 1: 30814822
> >> instock_shard26   replica 1: 31465543 replica 2: 31463414
> >> wks53104:Downloads sweiss$ php testReplication.php
> >> Found 2 mismatched shard counts.
> >> instock_shard19   replica 2: 30917008 replica 1: 30913715
> >> instock_shard26   replica 1: 31465543 replica 2: 31463414
> >> wks53104:Downloads sweiss$ php testReplication.php
> >> Everything looks peachy
> >>
> >> Took about a half hour to get there.
> >>
> >> Maybe the question should be - any way to get solrcloud to trigger this 
> >> *without* having to shut down / restart all nodes?  Even if we had to 
> >> trigger that manually after indexing, it would be fine.  It's a very 
> >> controlled indexing workflow that only happens once a day.
> >>
> >> --
> >> Steve
> >>
> >> On Mon, May 16, 2016 at 6:52 PM, Stephen Weiss 
> >> <steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>>>>
> >>  wrote:
> >> Each node has one JVM with 16GB of RAM.  Are you suggesting we would put 
> >> each shard into a separate JVM (something like 32 nodes)?
> >>
> >> We aren't encountering any OOMs.  We are testing this in a separate cloud 
> >> which no one is even using, the only activity is this very small amount of 
> >> indexing and still we see this problem.  In the logs, there are no errors 
> >> at all.  It's almost like none of the recovery features that people say 
> >> are in Solr, are actually there at all.  I can't find any evidence that 
> >> Solr is even attempting to keep the shards together.
> >>
> >> There are no real errors in the solr log.  I do see some warnings at 
> >> system startup:
> >>
> >> http://pastie.org/private/thz0fbzcxgdreeeune8w
> >>
> >> These lines in particular look interesting:
> >>
> >> 16925 INFO  
> >> (recoveryExecutor-3-thread-4-processing-n:172.20.140.173:8983_solr 
> >> x:instock_shard15_replica1 s:shard15 c:instock r:core_node31) [c:instock 
> >> s:shard15 r:core_node31 x:instock_shard15_replica1] o.a.s.u.PeerSync 
> >> PeerSync: core=instock_shard15_replica1 
> >> url=http://172.20.140.173:8983/solr  Received 0 versions from 
> >> http://172.20.140.172:8983/solr/instock_shard15_replica2/ 
> >> fingerprint:{maxVersionSpecified=9223372036854775807, 
> >> maxVersionEncountered=1534492620385943552, maxInHash=1534492620385943552, 
> >> versionsHash=-6845461210912808581, numVersions=30888332, numDocs=30888332, 
> >> maxDoc=37699007}
> >> 16925 INFO  
> >> (recoveryExecutor-3-thread-4-processing-n:172.20.140.173:8983_solr 
> >> x:instock_shard15_replica1 s:shard15 c:instock r:core_node31) [c:instock 
> >> s:shard15 r:core_node31 x:instock_shard15_replica1] o.a.s.u.PeerSync 
> >> PeerSync: core=instock_shard15_replica1 
> >> url=http://172.20.140.173:8983/solr DONE. sync failed
> >> 16925 INFO  
> >> (recoveryExecutor-3-thread-4-processing-n:172.20.140.173:8983_solr 
> >> x:instock_shard15_replica1 s:shard15 c:instock r:core_node31) [c:instock 
> >> s:shard15 r:core_node31 x:instock_shard15_replica1] 
> >> o.a.s.c.RecoveryStrategy PeerSync Recovery was not successful - trying 
> >> replication.
> >>
> >> This is the first node to start up, so most of the other shards are not 
> >> there yet.
> >>
> >> On another node (the last node to start up), it looks similar but a little 
> >> different:
> >>
> >> http://pastie.org/private/xjw0ruljcurdt4xpzqk6da
> >>
> >> 74090 INFO  
> >> (recoveryExecutor-3-thread-1-processing-n:172.20.140.177:8983_solr 
> >> x:instock_shard25_replica2 s:shard25 c:instock r:core_node60) [c:instock 
> >> s:shard25 r:core_node60 x:instock_shard25_replica2] 
> >> o.a.s.c.RecoveryStrategy Attempting to PeerSync from 
> >> [http://172.20.140.170:8983/solr/instock_shard25_replica1/] - 
> >> recoveringAfterStartup=[true]
> >> 74091 INFO  
> >> (recoveryExecutor-3-thread-1-processing-n:172.20.140.177:8983_solr 
> >> x:instock_shard25_replica2 s:shard25 c:instock r:core_node60) [c:instock 
> >> s:shard25 r:core_node60 x:instock_shard25_replica2] o.a.s.u.PeerSync 
> >> PeerSync: core=instock_shard25_replica2 
> >> url=http://172.20.140.177:8983/solr START 
> >> replicas=[http://172.20.140.170:8983/solr/instock_shard25_replica1/] 
> >> nUpdates=100
> >> 74091 WARN  
> >> (recoveryExecutor-3-thread-1-processing-n:172.20.140.177:8983_solr 
> >> x:instock_shard25_replica2 s:shard25 c:instock r:core_node60) [c:instock 
> >> s:shard25 r:core_node60 x:instock_shard25_replica2] o.a.s.u.PeerSync no 
> >> frame of reference to tell if we've missed updates
> >> 74091 INFO  
> >> (recoveryExecutor-3-thread-1-processing-n:172.20.140.177:8983_solr 
> >> x:instock_shard25_replica2 s:shard25 c:instock r:core_node60) [c:instock 
> >> s:shard25 r:core_node60 x:instock_shard25_replica2] 
> >> o.a.s.c.RecoveryStrategy PeerSync Recovery was not successful - trying 
> >> replication.
> >>
> >> Every single replica shows errors like this (either one or the other).
> >>
> >> I should add, beyond the block joins / nested children & grandchildren, 
> >> there's really nothing unusual about this cloud at all.  It's a very basic 
> >> collection (simple enough it can be created in the GUI) and a dist 
> >> installation of Solr 6.  There are 3 independent zookeeper servers (again, 
> >> vanilla from dist), and there don't appear to be any zookeeper issues.
> >>
> >> --
> >> Steve
> >>
> >> On Mon, May 16, 2016 at 12:02 PM, Erick Erickson 
> >> <erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>>><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com><mailto:erickerick...@gmail.com<mailto:erickerick...@gmail.com>>>>>
> >>  wrote:
> >> 8 nodes, 4 shards apiece? All in the same JVM? People have gotten by
> >> the GC pain by running in separate JVMs with less Java memory each on
> >> big beefy machines.... That's not a recommendation as much as an
> >> observation.
> >>
> >> That aside, unless you have some very strange stuff going on this is
> >> totally weird. Are you hitting OOM errors at any time you have this
> >> problem? Once you hit an OOM error, all bets are off about how Java
> >> behaves. If you are hitting those, you can't hope for stability until
> >> you fix that issue. In your writeup there's some evidence for this
> >> when you say that if you index multiple docs at a time you get
> >> failures.
> >>
> >> Do your Solr logs show any anomalies? My guess is that you'll see
> >> exceptions in your Solr logs that will shed light on the issue.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, May 16, 2016 at 8:03 AM, Stephen Weiss 
> >> <steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com><mailto:steve.we...@wgsn.com<mailto:steve.we...@wgsn.com>>>>>
> >>  wrote:
> >>> Hi everyone,
> >>>
> >>> I'm running into a problem with SolrCloud replicas and thought I would 
> >>> ask the list to see if anyone else has seen this / gotten past it.
> >>>
> >>> Right now, we are running with only one replica per shard.  This is 
> >>> obviously a problem because if one node goes down anywhere, the whole 
> >>> collection goes offline, and due to garbage collection issues, this 
> >>> happens about once or twice a week, causing a great deal of instability.  
> >>> If we try to increase to 2 replicas per shard, once we index new 
> >>> documents and the shards autocommit, the shards all get out of sync with 
> >>> each other, with different numbers of documents, different numbers of 
> >>> documents deleted, different facet counts - pretty much totally divergent 
> >>> indexes.  Shards always show green and available, and never go into 
> >>> recovery or any other state as to indicate there's a mismatch.  There are 
> >>> also no errors in the logs to indicate anything is going wrong.  Even 
> >>> long after indexing has finished, the replicas never come back into sync. 
> >>>  The only way to get consistency again is to delete one set of replicas 
> >>> and then add them back in.  Unfortunately, when we do this, we invariabl
 y discover that many documents (2-3%) are missing from the index.
> >>>
> >>> We have tried setting the min_rf parameter, and have found that when 
> >>> setting min_rf=2, we almost never get back rf=2.  We almost always get 
> >>> rf=1, resend the request, and it basically just goes into an infinite 
> >>> loop.  The only way to get rf=2 to come back is to only index one 
> >>> document at a time.  Unfortunately, we have to update millions of 
> >>> documents a day and it isn't really feasible to index this way, and even 
> >>> when indexing one document at a time, we still occasionally find 
> >>> ourselves in an infinite loop.  This doesn't appear to be related to the 
> >>> documents we are indexing - if we stop the index process and bounce solr, 
> >>> the exact same document will go through fine the next time until indexing 
> >>> stops up on another random document.
> >>>
> >>> We have 8 nodes, with 4 shards a piece, all running one collection with 
> >>> about 900M documents.  An important note is that we have a block join 
> >>> system with 3 tiers of documents (products -> skus -> sku_history).  
> >>> During indexing, we are forced to delete all documents for a product 
> >>> prior to adding the product back into the index, in order to avoid 
> >>> orphaned children / grandchildren.  All documents are consistently 
> >>> indexed with the top-level product ID so that we can delete all 
> >>> child/grandchild documents prior to updating the document.  So, for each 
> >>> updated document, we are sending through a delete call followed by an add 
> >>> call.  We have tried putting both the delete and add in the same update 
> >>> request with the same results.
> >>>
> >>> All we see out there on Google is that none of what we're seeing should 
> >>> be happening.
> >>>
> >>> We are currently running Solr 6.0 with Zookeeper 3.4.6.  We experienced 
> >>> the same behavior on 5.4 as well.
> >>>
> >>> --
> >>> Steve
> >>>
> >>> ________________________________
> >>>
> >>> WGSN is a global foresight business. Our experts provide deep insight and 
> >>> analysis of consumer, fashion and design trends. We inspire our clients 
> >>> to plan and trade their range with unparalleled confidence and accuracy. 
> >>> Together, we Create Tomorrow.
> >>>
> >>> WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
> >>> market-leading products including WGSN.com<http://www.wgsn.com>, WGSN 
> >>> Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
> >>> INstock<http://www.wgsninstock.com/>, WGSN 
> >>> StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
> >>> Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke 
> >>> consultancy services.
> >>>
> >>> The information in or attached to this email is confidential and may be 
> >>> legally privileged. If you are not the intended recipient of this 
> >>> message, any use, disclosure, copying, distribution or any action taken 
> >>> in reliance on it is prohibited and may be unlawful. If you have received 
> >>> this message in error, please notify the sender immediately by return 
> >>> email and delete this message and any copies from your computer and 
> >>> network. WGSN does not warrant that this email and any attachments are 
> >>> free from viruses and accepts no liability for any loss resulting from 
> >>> infected email transmissions.
> >>>
> >>> WGSN reserves the right to monitor all email through its networks. Any 
> >>> views expressed may be those of the originator and not necessarily of 
> >>> WGSN. WGSN is powered by Ascential plc<http://www.ascential.com>, which 
> >>> transforms knowledge businesses to deliver exceptional performance.
> >>>
> >>> Please be advised all phone calls may be recorded for training and 
> >>> quality purposes and by accepting and/or making calls from and/or to us 
> >>> you acknowledge and agree to calls being recorded.
> >>>
> >>> WGSN Limited, Company number 4858491
> >>>
> >>> registered address:
> >>>
> >>> Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP
> >>>
> >>> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered 
> >>> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United 
> >>> States
> >>>
> >>> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
> >>> 15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
> >>> 01453-000, Itaim Bibi, São Paulo
> >>>
> >>> 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询（上海）有限公司, 
> >>> registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong 
> >>> Qiao Road, Xuhui District, Shanghai
> >>
> >>
> >>
> >> ________________________________
> >>
> >> WGSN is a global foresight business. Our experts provide deep insight and 
> >> analysis of consumer, fashion and design trends. We inspire our clients to 
> >> plan and trade their range with unparalleled confidence and accuracy. 
> >> Together, we Create Tomorrow.
> >>
> >> WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
> >> market-leading products including WGSN.com<http://www.wgsn.com>, WGSN 
> >> Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
> >> INstock<http://www.wgsninstock.com/>, WGSN 
> >> StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
> >> Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke 
> >> consultancy services.
> >>
> >> The information in or attached to this email is confidential and may be 
> >> legally privileged. If you are not the intended recipient of this message, 
> >> any use, disclosure, copying, distribution or any action taken in reliance 
> >> on it is prohibited and may be unlawful. If you have received this message 
> >> in error, please notify the sender immediately by return email and delete 
> >> this message and any copies from your computer and network. WGSN does not 
> >> warrant that this email and any attachments are free from viruses and 
> >> accepts no liability for any loss resulting from infected email 
> >> transmissions.
> >>
> >> WGSN reserves the right to monitor all email through its networks. Any 
> >> views expressed may be those of the originator and not necessarily of 
> >> WGSN. WGSN is powered by Ascential plc<http://www.ascential.com>, which 
> >> transforms knowledge businesses to deliver exceptional performance.
> >>
> >> Please be advised all phone calls may be recorded for training and quality 
> >> purposes and by accepting and/or making calls from and/or to us you 
> >> acknowledge and agree to calls being recorded.
> >>
> >> WGSN Limited, Company number 4858491
> >>
> >> registered address:
> >>
> >> Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP
> >>
> >> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered 
> >> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
> >>
> >> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
> >> 15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
> >> 01453-000, Itaim Bibi, São Paulo
> >>
> >> 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询（上海）有限公司, 
> >> registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong 
> >> Qiao Road, Xuhui District, Shanghai
> >
> >
> > ________________________________
> >
> > WGSN is a global foresight business. Our experts provide deep insight and 
> > analysis of consumer, fashion and design trends. We inspire our clients to 
> > plan and trade their range with unparalleled confidence and accuracy. 
> > Together, we Create Tomorrow.
> >
> > WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
> > market-leading products including WGSN.com<http://www.wgsn.com>, WGSN 
> > Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
> > INstock<http://www.wgsninstock.com/>, WGSN 
> > StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
> > Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke 
> > consultancy services.
> >
> > The information in or attached to this email is confidential and may be 
> > legally privileged. If you are not the intended recipient of this message, 
> > any use, disclosure, copying, distribution or any action taken in reliance 
> > on it is prohibited and may be unlawful. If you have received this message 
> > in error, please notify the sender immediately by return email and delete 
> > this message and any copies from your computer and network. WGSN does not 
> > warrant that this email and any attachments are free from viruses and 
> > accepts no liability for any loss resulting from infected email 
> > transmissions.
> >
> > WGSN reserves the right to monitor all email through its networks. Any 
> > views expressed may be those of the originator and not necessarily of WGSN. 
> > WGSN is powered by Ascential plc<http://www.ascential.com>, which 
> > transforms knowledge businesses to deliver exceptional performance.
> >
> > Please be advised all phone calls may be recorded for training and quality 
> > purposes and by accepting and/or making calls from and/or to us you 
> > acknowledge and agree to calls being recorded.
> >
> > WGSN Limited, Company number 4858491
> >
> > registered address:
> >
> > Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP
> >
> > WGSN Inc., tax ID 04-3851246, registered office c/o National Registered 
> > Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
> >
> > 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
> > 15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
> > 01453-000, Itaim Bibi, São Paulo
> >
> > 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询（上海）有限公司, 
> > registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
> > Road, Xuhui District, Shanghai
>
>
>
> ________________________________
>
> WGSN is a global foresight business. Our experts provide deep insight and 
> analysis of consumer, fashion and design trends. We inspire our clients to 
> plan and trade their range with unparalleled confidence and accuracy. 
> Together, we Create Tomorrow.
>
> WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
> market-leading products including WGSN.com<http://www.wgsn.com>, WGSN 
> Lifestyle & Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
> INstock<http://www.wgsninstock.com/>, WGSN 
> StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
> Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke 
> consultancy services.
>
> The information in or attached to this email is confidential and may be 
> legally privileged. If you are not the intended recipient of this message, 
> any use, disclosure, copying, distribution or any action taken in reliance on 
> it is prohibited and may be unlawful. If you have received this message in 
> error, please notify the sender immediately by return email and delete this 
> message and any copies from your computer and network. WGSN does not warrant 
> that this email and any attachments are free from viruses and accepts no 
> liability for any loss resulting from infected email transmissions.
>
> WGSN reserves the right to monitor all email through its networks. Any views 
> expressed may be those of the originator and not necessarily of WGSN. WGSN is 
> powered by Ascential plc<http://www.ascential.com>, which transforms 
> knowledge businesses to deliver exceptional performance.
>
> Please be advised all phone calls may be recorded for training and quality 
> purposes and by accepting and/or making calls from and/or to us you 
> acknowledge and agree to calls being recorded.
>
> WGSN Limited, Company number 4858491
>
> registered address:
>
> Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP
>
> WGSN Inc., tax ID 04-3851246, registered office c/o National Registered 
> Agents, Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States
>
> 4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
> 15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
> 01453-000, Itaim Bibi, São Paulo
>
> 4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询（上海）有限公司, 
> registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
> Road, Xuhui District, Shanghai
>


________________________________

WGSN is a global foresight business. Our experts provide deep insight and 
analysis of consumer, fashion and design trends. We inspire our clients to plan 
and trade their range with unparalleled confidence and accuracy. Together, we 
Create Tomorrow.

WGSN<http://www.wgsn.com/> is part of WGSN Limited, comprising of 
market-leading products including WGSN.com<http://www.wgsn.com>, WGSN Lifestyle 
& Interiors<http://www.wgsn.com/en/lifestyle-interiors>, WGSN 
INstock<http://www.wgsninstock.com/>, WGSN 
StyleTrial<http://www.wgsn.com/en/styletrial/> and WGSN 
Mindset<http://www.wgsn.com/en/services/consultancy/>, our bespoke consultancy 
services.

The information in or attached to this email is confidential and may be legally 
privileged. If you are not the intended recipient of this message, any use, 
disclosure, copying, distribution or any action taken in reliance on it is 
prohibited and may be unlawful. If you have received this message in error, 
please notify the sender immediately by return email and delete this message 
and any copies from your computer and network. WGSN does not warrant that this 
email and any attachments are free from viruses and accepts no liability for 
any loss resulting from infected email transmissions.

WGSN reserves the right to monitor all email through its networks. Any views 
expressed may be those of the originator and not necessarily of WGSN. WGSN is 
powered by Ascential plc<http://www.ascential.com>, which transforms knowledge 
businesses to deliver exceptional performance.

Please be advised all phone calls may be recorded for training and quality 
purposes and by accepting and/or making calls from and/or to us you acknowledge 
and agree to calls being recorded.

WGSN Limited, Company number 4858491

registered address:

Ascential plc, The Prow, 1 Wilder Walk, London W1B 5AP

WGSN Inc., tax ID 04-3851246, registered office c/o National Registered Agents, 
Inc., 160 Greentree Drive, Suite 101, Dover DE 19904, United States

4C Serviços de Informação Ltda., CNPJ/MF (Taxpayer's Register): 
15.536.968/0001-04, Address: Avenida Cidade Jardim, 377, 7˚ andar CEP 
01453-000, Itaim Bibi, São Paulo

4C Business Information Consulting (Shanghai) Co., Ltd, 富新商务信息咨询（上海）有限公司, 
registered address Unit 4810/4811, 48/F Tower 1, Grand Gateway, 1 Hong Qiao 
Road, Xuhui District, Shanghai

Re: SolrCloud replicas consistently out of sync

Reply via email to