Thanks Mark and Tim.  My understanding has been upgraded.

-----Original Message-----
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, November 19, 2013 1:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Zookeeper down question


On Nov 19, 2013, at 2:24 PM, Timothy Potter <thelabd...@gmail.com> wrote:

> Good questions ... From my understanding, queries will work if Zk goes 
> down but writes do not work w/o Zookeeper. This works because the 
> clusterstate is cached on each node so Zookeeper doesn't participate 
> directly in queries and indexing requests. Solr has to decide not to 
> allow writes if it loses its connection to Zookeeper, which is a safe 
> guard mechanism. In other words, Solr assumes it's pretty safe to 
> allow reads if the cluster doesn't have a healthy coordinator, but chooses to 
> not allow writes to be safe.

Right - we currently stop accepting writes when Solr cannot talk to ZooKeeper - 
this is because we can no longer count on knowing about any changes to the 
cluster and no new leaders can be elected, etc. It gets tricky fast if you 
consider allowing updates without ZooKeeper connectivity for very long.

> 
> If a Solr nodes goes down while ZK is not available, since Solr no 
> longer accepts writes, leader / replica doesn't really matter. I'd 
> venture to guess there is some failover logic built in when executing 
> distributing queries but I'm not as familiar with that part of the 
> code (I'll brush up on it though as I'm now curious as well).

Right - query requests will fail over to other replicas - this is important in 
general because the cluster state a Solr instance has can be a bit stale - so a 
request might hit something that has gone down and another replica in the shard 
can be tried. We use the load balancing solrj client for these internal 
requests. CloudSolrServer handles failover for the user (or non internal) 
requests. Or you can use your own external load balancer.

- Mark

> 
> Cheers,
> Tim
> 
> 
> On Tue, Nov 19, 2013 at 11:58 AM, Garth Grimm < 
> garthgr...@averyranchconsulting.com> wrote:
> 
>> Given a 4 solr node instance (i.e. 2 shards, 2 replicas per shard), 
>> and a standalone zookeeper.
>> 
>> Correct me if any of my understanding is incorrect on the following:
>> If ZK goes down, most normal operations will still function, since my 
>> understanding is that ZK isn't involved on a transaction by 
>> transaction basis for each of these.....
>> Document adds, updates, and deletes on existing collection will still 
>> work as expected.
>> Queries will still get processed as expected.
>> Is the above correct?
>> 
>> But adding new collections, changing configs, etc., will all fail 
>> while ZK is down (or at least, place things in an inconsistent 
>> state?) Is that correct?
>> 
>> If, while ZK is down, one of the 4 solr nodes also goes down, will 
>> all normal operations fail?  Will they all continue to succeed?  I.e. 
>> will each of the nodes realize which node is down and route indexing 
>> and query requests around them, or is that impossible while ZK is 
>> down?  Will some queries succeed (because they were lucky enough to 
>> get routed to the one replica on the one shard that is still 
>> functional) while other queries fail (they aren't so lucky and get 
>> routed to the one replica that is down on the one shard)?
>> 
>> Thanks,
>> Garth Grimm
>> 
>> 
>> 

Reply via email to