Hello all, Our production application has recently experienced a very high spike in the following exception along with very large read times to our hbase cluster.
“org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-server\n\tat org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:99)\n\tat org.apache.hadoop.hbase.shaded.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)\n\tat org.apache.hadoop.hbase.shaded.org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)\n\tat org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)\n\tat org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:623)\n\tat org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:487)\n\tat org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168)\n\tat org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:605)\n\tat org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:585)\n\tat org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:564)\n\tat org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1211)\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1178)\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1152)\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1357)\n\tat org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1181)\n\tat org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:305)\n\tat org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)\n\tat org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)\n\tat org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)\n\tat org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)\n\tat org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)\n\tat org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)\n\tat org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)\n\tat org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)\n\tat” This error is not happening consistently as some reads to our table are happening successfully, so I am unable to narrow the issue down to a single configuration or connectivity failure. Things I’ve tried are: Using hbase zkcli to connect to our zookeeper server from the master instance. It is able to successfully connect and when running ‘ls’, the “/hbase/meta-region-server” znode is present. Checking the number of connections that are occurring to our zookeeper instance using the HBase web UI. The number of connections is currently 162. I double checked our hbase config and the value for ‘hbase.zookeeper.property.maxClientCnxns’ is 300. Any insight into the cause or other steps that I could take to debug this issue would be greatly appreciated. Thank you, Srinidhi
