Re: HBase unable to connect to zookeeper error

Josh Elser Fri, 31 Aug 2018 16:02:00 -0700

If it was related to maxClientCnxns, you would see sessions beingtorn-down and recreated in HBase on that node, as well as a clearmessage in the ZK server log that it's denying requests because thenumber of outstanding connections from that host exceeds the limit.

ConnectionLoss is a transient ZooKeeper state; more often than not, Isee this manifest as a result of unplanned pauses in HBase itself.Typically this is a result of JVM garbage collection pauses, other timesfrom Linux kernel/OS-level pauses. The former you can diagnose via thestandard JVM GC logging mechanisms, the latter usually via your syslogor dmesg.

When looking for unexpected pauses, remember that you also need to lookat what was happening in ZK. A JVM GC pause in ZK would exhibit the samekind of symptoms in HBase.

One final suggestion is to correlate it against other batch jobs (e.g.YARN, Spark) which may be running on the same node. It's possible thatthe node is not experiencing any explicit problems, but there is sometransient workload which happens to run and slows things down.


Have fun digging!

On 8/31/18 3:19 PM, Srinidhi Muppalla wrote:

Hello all,

Our production application has recently experienced a very high spike in the 
following exception along with very large read times to our hbase cluster.

“org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$ConnectionLossException:
 KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server\n\tat 
org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:99)\n\tat
 
org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)\n\tat
 
org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)\n\tat
 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)\n\tat
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:623)\n\tat 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)\n\tat
 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)\n\tat
 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:605)\n\tat
 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:585)\n\tat
 
org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:564)\n\tat
 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)\n\tat
 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1211)\n\tat
 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1178)\n\tat
 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1152)\n\tat
 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1357)\n\tat
 
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181)\n\tat
 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)\n\tat
 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)\n\tat
 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)\n\tat
 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)\n\tat
 org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)\n\tat 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)\n\tat
 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)\n\tat
 
org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)\n\tat
 org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)\n\tat”

This error is not happening consistently as some reads to our table are 
happening successfully, so I am unable to narrow the issue down to a single 
configuration or connectivity failure.

Things I’ve tried are:
Using hbase zkcli to connect to our zookeeper server from the master instance. 
It is able to successfully connect and when running ‘ls’, the 
“/hbase/meta-region-server” znode is present.
Checking the number of connections that are occurring to our zookeeper instance 
using the HBase web UI. The number of connections is currently 162. I double 
checked our hbase config and the value for 
‘hbase.zookeeper.property.maxClientCnxns’ is 300.

Any insight into the cause or other steps that I could take to debug this issue 
would be greatly appreciated.

Thank you,
Srinidhi

Re: HBase unable to connect to zookeeper error

Reply via email to