[ https://issues.apache.org/jira/browse/HBASE-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Elser resolved HBASE-26568. -------------------------------- Resolution: Workaround Resolving with "Workaround" being upgrade. > hbase master got stuck after running couple of days in Azure setup > ------------------------------------------------------------------ > > Key: HBASE-26568 > URL: https://issues.apache.org/jira/browse/HBASE-26568 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.1 > Environment: Azure cloud > Reporter: kaushik mandal > Priority: Major > Attachments: hbase-master-log-0.txt, hbase-master-log-1.txt > > > hadoop hbase version 2.0.1 > hadoop hdfs version 2.7.7 > > In Azure cluster setup, hbase master got hangs or not responding after > running couple of days > and the only way to recover hbase master is delete /hbase and restart. Bellow > is the error getting in the hbase-master > > Error message > ============== > 2021-11-18 13:06:55,396 INFO > [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000] > assignment.AssignProcedure: Retry=10 of max=10; pid=320, ppid=319, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, > region=1588230740; rit=OPENING, > location=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1637238611975 > 2021-11-18 13:06:55,396 INFO [PEWorker-16] assignment.AssignProcedure: > Retry=11 of max=10; pid=320, ppid=319, > state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, > region=1588230740; rit=OFFLINE, location=null 2021-11-18 13:06:55,944 ERROR > [PEWorker-16] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime > exception for pid=319, state=FAILED:RECOVER_META_ASSIGN_REGIONS, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true > java.lang.UnsupportedOperationException: unhandled > state=RECOVER_META_ASSIGN_REGIONS at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) > 2021-11-18 13:06:55,958 ERROR [PEWorker-16] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=319, > state=FAILED:RECOVER_META_ASSIGN_REGIONS, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true > java.lang.UnsupportedOperationException: unhandled > state=RECOVER_META_ASSIGN_REGIONS at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) > 2021-11-18 13:06:55,969 ERROR [PEWorker-16] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception for pid=319, > state=FAILED:RECOVER_META_ASSIGN_REGIONS, > exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via > AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max > attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true > java.lang.UnsupportedOperationException: unhandled > state=RECOVER_META_ASSIGN_REGIONS at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209) > at > org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203) > at > org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) > 2021-11-18 13:06:55,970 WARN [PEWorker-16] procedure2.ProcedureExecutor: > Worker terminating UNNATURALLY null java.lang.ArrayIndexOutOfBoundsException: > 2 at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513) > at > org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691) > at > org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1406) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) > 2021-11-18 13:07:46,268 INFO > [ReadOnlyZKClient-altiplano-zookeeper:2181@0x7e131580] zookeeper.ZooKeeper: > Session: 0x200000efa5dfae6 closed > ============================================================ > > Error Message: > ============================================================ > ==> > /opt/hbase-2.0.1/logs/hbase--master-nokiainfra-altiplano-hbase-master-0.log > <== > 2021-12-02 12:43:51,351 INFO > [RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000] > master.ServerManager: Registering > regionserver=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563 > 2021-12-02 12:43:54,699 ERROR > [RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000] > master.MasterRpcServices: lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372) > at com.sun.proxy.$Proxy20.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$6.addBlock(FanOutOneBlockAsyncDFSOutputHelper.java:380) > at > org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:774) > ... 24 more > 2021-12-02 12:43:54,746 INFO [main-EventThread] master.RegionServerTracker: > RegionServer ephemeral node deleted, processing expiration > [nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563] > 2021-12-02 12:43:54,746 INFO [main-EventThread] master.ServerManager: > Processing expiration of > nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563 > on > nokiainfra-altiplano-hbase-master-0.nokiainfra-altiplano-hbase-master.default.svc.cluster.local,16000,1638448730439 > 2021-12-02 12:43:54,860 INFO [PEWorker-10] procedure.ServerCrashProcedure: > Start pid=10, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure > server=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563, > splitWal=true, meta=false > -- This message was sent by Atlassian Jira (v8.20.1#820001)