Salvatore Papa created SOLR-13899:
-------------------------------------

             Summary: zkstatus page incorrectly reports zookeeper in error when 
Zookeeper observers are present
                 Key: SOLR-13899
                 URL: https://issues.apache.org/jira/browse/SOLR-13899
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.3.0
            Reporter: Salvatore Papa
         Attachments: zkstatus.png

When a zookeeper ensemble has 'observers', the zkstatus page incorrectly says 
Zookeeper status is in error (See attachment.)

This is because the 
[ZookeeperStatusHandler|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java]
 does not account for the 
'[observer|https://zookeeper.apache.org/doc/current/zookeeperObservers.html]' 
role whatsoever.

This should be an easy fix - I see there being two options;

1. Treat observers as followers by changing 
[L112|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L112]
 to
{code:java}
if ("follower".equals(state) || "observer".equals(state)) {
{code}
 
 2. Ignore observers from the required follower count by changing 
[L116|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/handler/admin/ZookeeperStatusHandler.java#L116]
 to
{code:java}
          reportedFollowers = 
Integer.parseInt(String.valueOf(stat.get("zk_synced_followers")));
{code}
Option 1 will make the zkstatus page show error when an observer is down.
 Option 2 will not make the zkstatus page show error when an observer is down.

*Ideally*, additional logic to account for observers should be added, and show 
a STATUS_YELLOW when any observers are down (but followers are all up), as this 
means the ensemble is only in a degraded, but functional state.

Happy to create a PR, however I don't have a lot of free time at home at the 
moment, so it may take a week or two.

 

Additional info:

See below for example mntr output for the Leader/Follower/Observer roles, 
noting the Leader's zk_followers and zk_synced_followers values, and the values 
of zk_server_state.

Leader:
{code:java}
[root@master1 ~]# echo mntr | nc master3 12181
zk_version 3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 10/08/2019 
20:18 GMT
zk_avg_latency 0
zk_max_latency 2
zk_min_latency 0
zk_packets_received 97
zk_packets_sent 96
zk_num_alive_connections 2
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 92
zk_watch_count 7
zk_ephemerals_count 9
zk_approximate_data_size 236333
zk_open_file_descriptor_count 64
zk_max_file_descriptor_count 4096
zk_followers 4
zk_synced_followers 2
zk_pending_syncs 0
zk_last_proposal_size -1
zk_max_proposal_size -1
zk_min_proposal_size -1
{code}
Follower:
{code:java}
[root@master1 ~]# echo mntr | nc master2 12181
zk_version      3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 
10/08/2019 20:18 GMT
zk_avg_latency  0
zk_max_latency  6
zk_min_latency  0
zk_packets_received     97
zk_packets_sent 96
zk_num_alive_connections        2
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  92
zk_watch_count  7
zk_ephemerals_count     9
zk_approximate_data_size        236333
zk_open_file_descriptor_count   60
zk_max_file_descriptor_count    4096
{code}
Observer:
{code:java}
[root@master1 ~]# echo mntr | nc slave1 12181
zk_version      3.5.6-c11b7e26bc554b8523dc929761dd28808913f091, built on 
10/08/2019 20:18 GMT
zk_avg_latency  0
zk_max_latency  8
zk_min_latency  0
zk_packets_received     174
zk_packets_sent 173
zk_num_alive_connections        2
zk_outstanding_requests 0
zk_server_state observer
zk_znode_count  92
zk_watch_count  7
zk_ephemerals_count     9
zk_approximate_data_size        236333
zk_open_file_descriptor_count   59
zk_max_file_descriptor_count    4096
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to