Hari Krishna Dara created HBASE-15992:
-----------------------------------------

             Summary: Preserve original KeeperException when converted to 
external exceptions
                 Key: HBASE-15992
                 URL: https://issues.apache.org/jira/browse/HBASE-15992
             Project: HBase
          Issue Type: Brainstorming
          Components: hbase
    Affects Versions: 0.98.14
            Reporter: Hari Krishna Dara
            Priority: Minor


During an investigation in which we were seeing unexpected 
{{NoServerForRegionException}} errors, the root cause turned out to be a 
{{KeeperException}} that got lost and so resulted in a misleading top level 
indication.

The underlying exception with partial stacktrace is this:

{noformat}
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed for /hbase/meta-region-server
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1289)
        at 
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:684)
        at 
org.apache.hadoop.hbase.zookeeper.ZKUtil.blockUntilAvailable(ZKUtil.java:2032)
        at 
org.apache.hadoop.hbase.zookeeper.MetaRegionTracker.blockUntilAvailable(MetaRegionTracker.java:203)
        at 
org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:58)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateMeta(HConnectionManager.java:1209)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1175)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1301)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1178)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1135)
        at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:976)
{noformat}

Here is some additional information:
* The exception first gets caught 
[here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L366]
* It gets logged and rethrown from 
[here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/RecoverableZooKeeper.java#L279]
* It gets caught again, logged and rethrown 
[here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L693]
* This finally gets caught and rethrown as InterruptedException 
[here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java#L2037]

When thrown as {{InterruptedException}}, the cause is lost, so [the code 
catching 
it|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ZooKeeperRegistry.java#L65]
 can't (and currently doesn't) determine the cause. Perhaps the exception 
should be preserved and passed on to [the 
caller|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java#L1312]
 such that it is available when finally the {{NoServerForRegionException}} is 
thrown 
[here|https://github.com/apache/hbase/blob/rel/0.98.14/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java#L1281].
 Alternatively, a more meaningful exception could also be thrown instead of a 
misleading {{NoServerForRegionException}}, especially in cases where the 
failure indicates a more permanent condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to