Re: SolrCloud and split-brain

Otis Gospodnetic Mon, 18 Jun 2012 19:04:14 -0700

Hi Mark,

Thanks.  All that is clear (I think Voldemort does a good job with hinted 
handoff, which I think Mark is referring to).
The part that I'm not clear about is maybe not SolrCloud-specific, and that is 
- what exactly prevents the two halves of a cluster that's been split from 
thinking they are *the* cluster?
Let's say you have a 10-node cluster, say with 10 ZK instances, one instance on 
each Solr node.
And say 5 of these 10 servers are on switch A and the other 5 are on switch B.
Something happens and switch A and 5 nodes on it get separated from 5 nodes on 
switch B.
Say that both A and B happen to have complete copies of the index.


What in Solr (or ZK) tells either A or B half that "no, you are not *the* 
cluster and thou shalt not accept updates"?

I'm guessing 
this: https://cwiki.apache.org/confluence/display/ZOOKEEPER/FailureScenarios ?

So then the Q becomes: if we have 10 ZK nodes and they split in 5 & 5 nodes, 
does that mean neither side will have quorum because having 10 ZKs was a bad 
number of ZKs to have to begin with?

Thanks,
Otis 
----
Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 




----- Original Message -----
> From: Mark Miller <markrmil...@gmail.com>
> To: solr-user <solr-user@lucene.apache.org>
> Cc: 
> Sent: Monday, June 18, 2012 11:05 AM
> Subject: Re: SolrCloud and split-brain
> 
> 
> On Jun 15, 2012, at 10:33 PM, Otis Gospodnetic wrote:
> 
>>  However, if my half brain understands what split brain is then I think 
> that's not a completely true claim because one can get unlucky and get a 
> SolrCloud cluster partitioned in a way that one or even all partitions reject 
> indexing (and update and deletion) requests if they do not have a complete 
> index.
> 
> That's not split brain. Split brain means that multiple partitioned clusters 
> think they are *the* cluster and would keep accepting updates. This is a real 
> problem because when you unsplit the cluster, you cannot reconcile 
> conflicting 
> updates easily! In many cases you have to ask the user to resolve the 
> conflict.
> 
> Yes, you must have a node to serve a shard in order to index to that shard. 
> You 
> do not need the whole index - but if an update hashes to a shard that has no 
> nodes hosting it, it will fail. If there is no node, the document has no 
> where 
> to live. Some systems do interesting things like buffer those updates to 
> other 
> nodes for a while - we don't plan on anything like that soon. At some point, 
> you can only survive a loss of so many nodes before its time to give up 
> accepting updates in any system. If you need to survive catastrophic loss of 
> nodes, you have to have enough replicas to handle it. Whether those nodes are 
> partitioned off from the cluster or simply die, it's all the same. You can 
> only survive so many node loses, and replicas are your defense.
> 
> The lack of split-brain allows your cluster to remain consistent. If you 
> allow 
> split brain you have to use something like vector clocks and handle conflict 
> resolution when the splits rejoin, or you will just have a lot of messed up 
> data. You generally allow split brain when you want to favor write 
> availability 
> in the face of partitions, like Dynamo. But you must have a strategy for 
> rejoining splits (like vector clocks or something) or you can never properly 
> go 
> back to a single, consistent cluster. We favor consistency in the face of 
> partitions rather than write availability. It seemed like the right choice 
> for 
> Solr.
> 
> - Mark Miller
> lucidimagination.com
>

Re: SolrCloud and split-brain

Reply via email to