In ZK 3.4.x if you have configuration differences amongst your instances you are susceptible to a split brain. See this email thread, "Rolling Config Change Considered Harmful":
http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html <http://zookeeper-user.578899.n2.nabble.com/Rolling-config-change-considered-harmful-td7578761.html> In ZK 3.5.x I'm not even sure it would work. -JZ > On May 26, 2017, at 5:43 PM, Shawn Heisey <[email protected]> wrote: > > I feel fairly certain that this thread willbe an annoyance. I don't > know enough about zookeeper to answer the questions that are being > asked, so I apologize about needing to relay questions about ZK fault > tolerance in two datacenters. > > It seems that everyone wants to avoid the expense of a tie-breaker ZK VM > in a third datacenter. > > The scenario, which this list has seen over and over: > > DC1 - three ZK servers, one or more Solr servers. > DC2 - two ZK servers, one or more Solr servers. > > I've already explained that if DC2 goes down, everything's fine, but if > DC1 goes down, Solr goes ready-only, and there's no way to prevent that. > > The conversation went further, and I'm sure you guys have seen this > before too: "Is there any way we can get DC2 back to operational with > manual intervention if DC1 goes down?" I explained that any manual > intervention would briefly take Solr down ... at which point the > following proposal was mentioned: > > Add an observer node to DC2, and in the event DC1 goes down, run a > script that reconfigures all the ZK servers to change the observer to a > voting member and does rolling restarts. > > Will their proposal work? What happens when DC1 comes back online? As > you know, DC1 will contain a partial ensemble that still has quorum, > about to rejoin what it THINKS is a partial ensemble *without* quorum, > which is not what it will find. I'm guessing that ZK assumes the > question of who has the "real" quorum shouldn't ever need to be > negotiated, because the rules prevent multiple partitions from gaining > quorum. > > Solr currently ships with 3.4.6, but the next version of Solr (about to > drop any day now) will have 3.4.10. Once 3.5 is released and Solr is > updated to use it, does the situation I've described above change in any > meaningful way? > > Thanks, > Shawn >
