Guanghao Zhang created HBASE-18111:
--------------------------------------

             Summary: Replication stuck when cluster connection is closed
                 Key: HBASE-18111
                 URL: https://issues.apache.org/jira/browse/HBASE-18111
             Project: HBase
          Issue Type: Bug
    Affects Versions: 1.1.10, 0.98.24, 1.2.5, 1.3.1, 2.0.0, 1.4.0
            Reporter: Guanghao Zhang
            Assignee: Guanghao Zhang


Log:
{code}
2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
member failed: javax.security.sasl.SaslException: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum 
Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
org.apache.hadoop.hbase.client.HConnectionImplementation: 
hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
 baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
received auth failed from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
sessionid=0x35b6b4d4ca999c6

2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Replicate edites to peer cluster failed.
java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
exception: java.io.IOException: Connection closed
{code}

jstack
{code}
 java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
        at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
{code}

The cluster connection was aborted when the ZookeeperWatcher receive a 
AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to