Hi Guys, I am using hbase 1.2.0 on a kerberos secured cloudera CDH 5.8 cluster. I have a persistant application that authenticates using keytab and creates hbase connection. Our code also takes care of reauthentication and recreating broken connectiion. The code worked fine in previous versions of hbase. However what we see with Hbase 1.2 is that after 24 hours the hbase connection does not work giving following error
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions: Tue Feb 13 12:57:51 PST 2018, RpcRetryingCaller{globalStartTime=1518555467140, pause=100, retries=2}, org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to pdmcdh01.xyz.com/192.168.145.62:60020 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to pdmcdh01.xyz.com/192.168.145.62:60020 is closing. Call id=137, waitTime=11 Tue Feb 13 12:58:01 PST 2018, RpcRetryingCaller{globalStartTime=1518555467140, pause=100, retries=2}, org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to pdmcdh01.xyz.com/192.168.145.62:60020 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to pdmcdh01.xyz.com/192.168.145.62:60020 is closing. Call id=139, waitTime=13 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:935) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:901) Our code reauthnticates and creates connection again but it still keeps failing org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=2, exceptions: Wed Feb 21 14:30:31 PST 2018, RpcRetryingCaller{globalStartTime=1519252219159, pause=100, retries=2}, java.io.IOException: Couldn't setup connection for p...@hadoop.xyz.com to hbase/pdmcdh01.xyz....@hadoop.xyz.com Wed Feb 21 14:30:31 PST 2018, RpcRetryingCaller{globalStartTime=1519252219159, pause=100, retries=2}, org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: pdmcdh01.xyz.com/192.168.145.62:60020 at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:147) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:935) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:901) I know that client keeps server in the failed list for few seconds in order to reduce too many connection attempts. So I waited and tried after some time but still same error. Once we restart our application everything starts working fine again for next 24 hours. This 24 hours gap indicates that it could be something related to Kerberos ticket expiry time, however there is no log to indicate Kerberos authentication issue. Moreover we are handling the exception and trying to authenticate and create connection again but nothing works until we restart JVM. this is very strange. I would really appreciate any help or pointers on this issue. Thanks a lot Apratim