Salvatore Papa created SOLR-13899: ------------------------------------- Summary: zkstatus page incorrectly reports zookeeper in error when Zookeeper observers are present Key: SOLR-13899 URL: https://issues.apache.org/jira/browse/SOLR-13899 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 8.3.0 Reporter: Salvatore Papa Attachments: zkstatus.png
When a zookeeper ensemble has 'observers', the zkstatus page incorrectly says Zookeeper status is in error (See attachment.) This is because the [ZookeeperStatusHandler|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java] does not account for the '[observer|https://zookeeper.apache.org/doc/current/zookeeperObservers.html]' role whatsoever. This should be an easy fix - I see there being two options; 1. Treat observers as followers by changing [L112|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L112] to {code:java} if ("follower".equals(state) || "observer".equals(state)) { {code} 2. Ignore observers from the required follower count by changing [L116|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L116] to {code:java} reportedFollowers = Integer.parseInt(String.valueOf(stat.get("zk_synced_followers"))); {code} Option 1 will make the zkstatus page show error when an observer is down. Option 2 will not make the zkstatus page show error when an observer is down. *Ideally*, additional logic to account for observers should be added, and show a STATUS_YELLOW when any observers are down (but followers are all up), as this means the ensemble is only in a degraded, but functional state. Happy to create a PR, however I don't have a lot of free time at home at the moment, so it may take a week or two. Additional info: See below for example mntr output for the Leader/Follower/Observer roles, noting the Leader's zk_followers and zk_synced_followers values, and the values of zk_server_state. Leader: {code:java} [root@master1 ~]# echo mntr | nc master3 12181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 2 zk_min_latency 0 zk_packets_received 97 zk_packets_sent 96 zk_num_alive_connections 2 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 92 zk_watch_count 7 zk_ephemerals_count 9 zk_approximate_data_size 236333 zk_open_file_descriptor_count 64 zk_max_file_descriptor_count 4096 zk_followers 4 zk_synced_followers 2 zk_pending_syncs 0 zk_last_proposal_size -1 zk_max_proposal_size -1 zk_min_proposal_size -1 {code} Follower: {code:java} [root@master1 ~]# echo mntr | nc master2 12181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 6 zk_min_latency 0 zk_packets_received 97 zk_packets_sent 96 zk_num_alive_connections 2 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 92 zk_watch_count 7 zk_ephemerals_count 9 zk_approximate_data_size 236333 zk_open_file_descriptor_count 60 zk_max_file_descriptor_count 4096 {code} Observer: {code:java} [root@master1 ~]# echo mntr | nc slave1 12181 zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 20:18 GMT zk_avg_latency 0 zk_max_latency 8 zk_min_latency 0 zk_packets_received 174 zk_packets_sent 173 zk_num_alive_connections 2 zk_outstanding_requests 0 zk_server_state observer zk_znode_count 92 zk_watch_count 7 zk_ephemerals_count 9 zk_approximate_data_size 236333 zk_open_file_descriptor_count 59 zk_max_file_descriptor_count 4096 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org