Hi Sushil,

> I am trying to run a 3-node zookeeper cluster.
> It starts up good and I am able to access it.
> However, as soon as I shutdown the leader, some other node out of
> left-overs becomes a primary node which I believe is working as expected.

Are you sure about that?  Does everything look normal if you issue a
"monitor" command on one of the survivors, using either:

    echo mntr | nc example.com 2181

or by visiting:

    http://example.com:8080/commands/monitor

Or do you get a message such as "This ZooKeeper instance is not
currently serving requests"?

> However, if I try to connect using the zkCli.sh in this state, it cannot
> connect, it always remains in connecting state, and there is no way now
> that I can access my zookeeper cluster.
>
> The only way I have been able to fix is stop all nodes and start then in
> sequence.
>
> Couple of questions.
> First of all that zkCli.sh behavior with the cluster does not looks
> something a happy path to me. I doubt if my cluster is behaving good. Now
> if this cluster is not working why does my cluster status appear working
> "LEADER/FOLLOWER" for each left over node.

I have seen such problems in some configurations where the ensemble was
unable to recover due to flaky (?) host name resolution, and have found
using IP addresses in zoo.conf to be more reliable.  Are you using host
names in zoo.conf?

> I tried this with 5-node cluster and noticed exactly the same behavior.
> So I wonder how do people generally manage a working zookeeper cluster with
> leader going down.

Best, -D

Reply via email to