[ 
https://issues.apache.org/jira/browse/HBASE-26568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved HBASE-26568.
--------------------------------
    Resolution: Workaround

Resolving with "Workaround" being upgrade.

> hbase master got stuck after running couple of days in Azure setup
> ------------------------------------------------------------------
>
>                 Key: HBASE-26568
>                 URL: https://issues.apache.org/jira/browse/HBASE-26568
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.1
>         Environment: Azure cloud
>            Reporter: kaushik mandal
>            Priority: Major
>         Attachments: hbase-master-log-0.txt, hbase-master-log-1.txt
>
>
> hadoop hbase version 2.0.1
> hadoop hdfs version 2.7.7
>  
> In Azure cluster setup, hbase master got hangs or not responding after 
> running couple of days
> and the only way to recover hbase master is delete /hbase and restart. Bellow 
> is the error getting in the hbase-master
>  
> Error message
> ==============
> 2021-11-18 13:06:55,396 INFO 
> [RpcServer.priority.FPBQ.Fifo.handler=19,queue=1,port=16000] 
> assignment.AssignProcedure: Retry=10 of max=10; pid=320, ppid=319, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; AssignProcedure table=hbase:meta, 
> region=1588230740; rit=OPENING, 
> location=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1637238611975
>  2021-11-18 13:06:55,396 INFO [PEWorker-16] assignment.AssignProcedure: 
> Retry=11 of max=10; pid=320, ppid=319, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE; AssignProcedure table=hbase:meta, 
> region=1588230740; rit=OFFLINE, location=null 2021-11-18 13:06:55,944 ERROR 
> [PEWorker-16] procedure2.ProcedureExecutor: CODE-BUG: Uncaught runtime 
> exception for pid=319, state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
> java.lang.UnsupportedOperationException: unhandled 
> state=RECOVER_META_ASSIGN_REGIONS at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
>  at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
>  at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>  at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
>  2021-11-18 13:06:55,958 ERROR [PEWorker-16] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=319, 
> state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
> java.lang.UnsupportedOperationException: unhandled 
> state=RECOVER_META_ASSIGN_REGIONS at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
>  at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
>  at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>  at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
>  2021-11-18 13:06:55,969 ERROR [PEWorker-16] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception for pid=319, 
> state=FAILED:RECOVER_META_ASSIGN_REGIONS, 
> exception=org.apache.hadoop.hbase.client.RetriesExhaustedException via 
> AssignProcedure:org.apache.hadoop.hbase.client.RetriesExhaustedException: Max 
> attempts exceeded; RecoverMetaProcedure failedMetaServer=null, splitWal=true 
> java.lang.UnsupportedOperationException: unhandled 
> state=RECOVER_META_ASSIGN_REGIONS at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:209)
>  at 
> org.apache.hadoop.hbase.master.procedure.RecoverMetaProcedure.rollbackState(RecoverMetaProcedure.java:52)
>  at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
>  at 
> org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864) 
> at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1372)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
>  2021-11-18 13:06:55,970 WARN [PEWorker-16] procedure2.ProcedureExecutor: 
> Worker terminating UNNATURALLY null java.lang.ArrayIndexOutOfBoundsException: 
> 2 at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
>  at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
>  at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
>  at 
> org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
>  at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
>  at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
>  at 
> org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1406)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1328)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1197)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>  at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
>  2021-11-18 13:07:46,268 INFO 
> [ReadOnlyZKClient-altiplano-zookeeper:2181@0x7e131580] zookeeper.ZooKeeper: 
> Session: 0x200000efa5dfae6 closed
> ============================================================
>  
> Error Message:
> ============================================================
> ==> 
> /opt/hbase-2.0.1/logs/hbase--master-nokiainfra-altiplano-hbase-master-0.log 
> <==
> 2021-12-02 12:43:51,351 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000] 
> master.ServerManager: Registering 
> regionserver=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
> 2021-12-02 12:43:54,699 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=129,queue=12,port=16000] 
> master.MasterRpcServices: lock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
>     at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper$6.addBlock(FanOutOneBlockAsyncDFSOutputHelper.java:380)
>     at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper.createOutput(FanOutOneBlockAsyncDFSOutputHelper.java:774)
>     ... 24 more
> 2021-12-02 12:43:54,746 INFO  [main-EventThread] master.RegionServerTracker: 
> RegionServer ephemeral node deleted, processing expiration 
> [nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563]
> 2021-12-02 12:43:54,746 INFO  [main-EventThread] master.ServerManager: 
> Processing expiration of 
> nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563
>  on 
> nokiainfra-altiplano-hbase-master-0.nokiainfra-altiplano-hbase-master.default.svc.cluster.local,16000,1638448730439
> 2021-12-02 12:43:54,860 INFO  [PEWorker-10] procedure.ServerCrashProcedure: 
> Start pid=10, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=nokiainfra-altiplano-hbase-regionserver-1.nokiainfra-altiplano-hbase-regionserver.default.svc.cluster.local,16020,1638449029563,
>  splitWal=true, meta=false
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to