[ 
https://issues.apache.org/jira/browse/TEPHRA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andreas Neumann resolved TEPHRA-249.
------------------------------------
    Resolution: Invalid
      Assignee: Andreas Neumann  (was: Poorna Chandra)

The problem was not in Tephra, but in all three cases, an issue with the CDAP 
coprocessors that extend/reuse Tephra's. 

> HBase coprocessors sometimes cannot access tables due to ZK auth failure
> ------------------------------------------------------------------------
>
>                 Key: TEPHRA-249
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-249
>             Project: Tephra
>          Issue Type: Bug
>            Reporter: Andreas Neumann
>            Assignee: Andreas Neumann
>
> Sometimes, region servers have many messages in the logs of the form:
> {noformat}
> 2017-08-15 15:52:51,478 ERROR [tx-state-refresh] zookeeper.ZooKeeperWatcher: 
> hconnection-0x234b6ae9-0x15b49966f34f9bb, 
> quorum=<host>:2181,<host>:2181,<host>:2181, baseZNode=/hbase-secure Received 
> unexpected KeeperException, re-throwing exception
> org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
> AuthFailed for /hbase-secure/meta-region-server
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:123)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
>         at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622)
>         at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:491)
>         at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:172)
>         at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:608)
>         at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:589)
>         at 
> org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:568)
>         at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1192)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
>         at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>         at 
> org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211)
>         at 
> org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1256)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103)
>         at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938)
>         at 
> org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83)
>         at 
> org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79)
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:862)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:828)
>         at 
> co.cask.cdap.data2.util.hbase.ConfigurationTable.read(ConfigurationTable.java:133)
>         at 
> co.cask.cdap.data2.transaction.coprocessor.DefaultTransactionStateCache.getSnapshotConfiguration(DefaultTransactionStateCache.java:56)
>         at 
> org.apache.tephra.coprocessor.TransactionStateCache.tryInit(TransactionStateCache.java:94)
>         at 
> org.apache.tephra.coprocessor.TransactionStateCache.refreshState(TransactionStateCache.java:153)
>         at 
> org.apache.tephra.coprocessor.TransactionStateCache.access$300(TransactionStateCache.java:42)
>         at 
> org.apache.tephra.coprocessor.TransactionStateCache$1.run(TransactionStateCache.java:131)
> {noformat}
> If this happens, then it happens equally for the transaction state cache and 
> for the prune state. 
> The behavior is pretty bad: the coprocessor attempts to access a Table, for 
> that it needs to access the meta region, which fails due to ZK authorization. 
> Unfortunately, the HBase client does this with a blocking busy retry loop for 
> 5 minutes, so it floods the logs for 5 minutes. Then the next coprocessor 
> gets its turn and produces another 5 minutes of unthrottled retries and error 
> messages. 
> The consequence is that coprocessors cannot read the transaction state or the 
> configuration. Hence, for example, they cannot find out whether tx pruning is 
> enabled and don't record prune info ever. 
> There is a way to impersonate the login user when accessing a table from a 
> coprocessor. That appears to fix the problem. or all coprocessors.
> Or is there even a better way to access a table from a coprocessor, than 
> using an HBase client? Is it possible via the coprocessor environment? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to