Hello Damien Thanks for replying back on this. This is what I see when running the monitor command
$ echo mntr | nc nifi-investigate-zk-zk-1 2181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 9 zk_min_latency 0 zk_packets_received 607609 zk_packets_sent 607608 zk_num_alive_connections 2 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 9 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 281 zk_open_file_descriptor_count 68 zk_max_file_descriptor_count 4096 $ echo mntr | nc nifi-investigate-zk-zk-2 2181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 17 zk_min_latency 0 zk_packets_received 41179 zk_packets_sent 41178 zk_num_alive_connections 3 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 9 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size 281 zk_open_file_descriptor_count 70 zk_max_file_descriptor_count 4096 zk_followers 1 zk_synced_followers 1 zk_pending_syncs 0 zk_last_proposal_size 32 zk_max_proposal_size 125 zk_min_proposal_size 32 Regarding the hostname resolution, I am not using any zoo.conf, hostnames are recognized by dns itself. Thanks Sushil Kumar On Wed, Nov 20, 2019 at 12:01 AM Damien Diederen <ddiede...@sinenomine.net> wrote: > > Hi Sushil, > > > I am trying to run a 3-node zookeeper cluster. > > It starts up good and I am able to access it. > > However, as soon as I shutdown the leader, some other node out of > > left-overs becomes a primary node which I believe is working as expected. > > Are you sure about that? Does everything look normal if you issue a > "monitor" command on one of the survivors, using either: > > echo mntr | nc example.com 2181 > > or by visiting: > > http://example.com:8080/commands/monitor > > Or do you get a message such as "This ZooKeeper instance is not > currently serving requests"? > > > However, if I try to connect using the zkCli.sh in this state, it cannot > > connect, it always remains in connecting state, and there is no way now > > that I can access my zookeeper cluster. > > > > The only way I have been able to fix is stop all nodes and start then in > > sequence. > > > > Couple of questions. > > First of all that zkCli.sh behavior with the cluster does not looks > > something a happy path to me. I doubt if my cluster is behaving good. Now > > if this cluster is not working why does my cluster status appear working > > "LEADER/FOLLOWER" for each left over node. > > I have seen such problems in some configurations where the ensemble was > unable to recover due to flaky (?) host name resolution, and have found > using IP addresses in zoo.conf to be more reliable. Are you using host > names in zoo.conf? > > > I tried this with 5-node cluster and noticed exactly the same behavior. > > So I wonder how do people generally manage a working zookeeper cluster > with > > leader going down. > > Best, -D > -- -- Thanks Sushil Kumar +1-(206)-698-4116