[ https://issues.apache.org/jira/browse/TEPHRA-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andreas Neumann resolved TEPHRA-249. ------------------------------------ Resolution: Invalid Assignee: Andreas Neumann (was: Poorna Chandra) The problem was not in Tephra, but in all three cases, an issue with the CDAP coprocessors that extend/reuse Tephra's. > HBase coprocessors sometimes cannot access tables due to ZK auth failure > ------------------------------------------------------------------------ > > Key: TEPHRA-249 > URL: https://issues.apache.org/jira/browse/TEPHRA-249 > Project: Tephra > Issue Type: Bug > Reporter: Andreas Neumann > Assignee: Andreas Neumann > > Sometimes, region servers have many messages in the logs of the form: > {noformat} > 2017-08-15 15:52:51,478 ERROR [tx-state-refresh] zookeeper.ZooKeeperWatcher: > hconnection-0x234b6ae9-0x15b49966f34f9bb, > quorum=<host>:2181,<host>:2181,<host>:2181, baseZNode=/hbase-secure Received > unexpected KeeperException, re-throwing exception > org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = > AuthFailed for /hbase-secure/meta-region-server > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:123) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:622) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:491) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:172) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:608) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:589) > at > org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:568) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1192) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) > at > org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211) > at > org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1256) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1162) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1146) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1103) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:938) > at > org.apache.hadoop.hbase.client.HRegionLocator.getRegionLocation(HRegionLocator.java:83) > at > org.apache.hadoop.hbase.client.RegionServerCallable.prepare(RegionServerCallable.java:79) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:862) > at org.apache.hadoop.hbase.client.HTable.get(HTable.java:828) > at > co.cask.cdap.data2.util.hbase.ConfigurationTable.read(ConfigurationTable.java:133) > at > co.cask.cdap.data2.transaction.coprocessor.DefaultTransactionStateCache.getSnapshotConfiguration(DefaultTransactionStateCache.java:56) > at > org.apache.tephra.coprocessor.TransactionStateCache.tryInit(TransactionStateCache.java:94) > at > org.apache.tephra.coprocessor.TransactionStateCache.refreshState(TransactionStateCache.java:153) > at > org.apache.tephra.coprocessor.TransactionStateCache.access$300(TransactionStateCache.java:42) > at > org.apache.tephra.coprocessor.TransactionStateCache$1.run(TransactionStateCache.java:131) > {noformat} > If this happens, then it happens equally for the transaction state cache and > for the prune state. > The behavior is pretty bad: the coprocessor attempts to access a Table, for > that it needs to access the meta region, which fails due to ZK authorization. > Unfortunately, the HBase client does this with a blocking busy retry loop for > 5 minutes, so it floods the logs for 5 minutes. Then the next coprocessor > gets its turn and produces another 5 minutes of unthrottled retries and error > messages. > The consequence is that coprocessors cannot read the transaction state or the > configuration. Hence, for example, they cannot find out whether tx pruning is > enabled and don't record prune info ever. > There is a way to impersonate the login user when accessing a table from a > coprocessor. That appears to fix the problem. or all coprocessors. > Or is there even a better way to access a table from a coprocessor, than > using an HBase client? Is it possible via the coprocessor environment? -- This message was sent by Atlassian JIRA (v6.4.14#64029)