Hello,

We have a production system down and can't get it back up. It's an older 
version:0.96.1.1. We have 70 OFFLINE unassigned regions.

We were configuring for multiple nics and set both 
hbase.regionserver.ipc.address and hbase.master.ipc.address to 0.0.0.0.

This caused hostname lookup issues which was a known issue years ago (we now 
know).

We now have 3 dead region servers listed with names: 0:0:0:0:0:0:0:0,60020

We've undone our changes and restarted the cluster, but the invalid server 
names,0:0:0:0:0:0:0:0,60020, are still showing up in dead region servers list 
and have OFFLINE regions still assigned to them.

Had no luck with hbck and shell assign, move, unassign.


Status Pages and Logs for a specific Region:

Dead Region Servers:
0:0:0:0:0:0:0:0,60020,1713708572030


Master Status Page:
d31a033cd7810e347639e12833969754    
c,\x92I$\x92I$\x92I$\x92I$\x92I$\x90,1330810355551.d31a033cd7810e347639e12833969754.
 state=OFFLINE, ts=Wed Apr 24 08:02:53 CDT 2024 (376s ago), 
server=0:0:0:0:0:0:0:0,60020,1713708572030

Master Status Page Table Regions:
Region is flagged as "not deployed"

Results From hbck:
ERROR: Region { meta => 
c,\x92I$\x92I$\x92I$\x92I$\x92I$\x90,1330810355551.d31a033cd7810e347639e12833969754.,
 hdfs => hdfs://gs1/hbase/data/default/c/d31a033cd7810e347639e12833969754, 
deployed =>  } not deployed on any region server.

Master Log Entries:
2024-04-24 08:02:53,453 INFO  [master:master:60000] master.AssignmentManager: 
Processing d31a033cd7810e347639e12833969754 in state: M_ZK_REGION_OFFLINE
2024-04-24 08:02:53,453 INFO  [master:master:60000] master.RegionStates: 
Transitioned {d31a033cd7810e347639e12833969754 state=OFFLINE, ts=1713963773352, 
server=null} to {d31a033cd7810e347639e12833969754 state=OFFLINE, 
ts=1713963773453, server=0:0:0:0:0:0:0:0,60020,1713708572030}
2024-04-24 08:02:53,695 INFO  [MASTER_SERVER_OPERATIONS-master:60000-0] 
handler.ServerShutdownHandler: Skip assigning region in transition on other 
server{d31a033cd7810e347639e12833969754 state=OFFLINE, ts=1713963773453, 
server=0:0:0:0:0:0:0:0,60020,1713708572030}

From HBase Shell:
assign 'd31a033cd7810e347639e12833969754'
0 row(s) in 1.1120 seconds

Shell assign Master Log Entry:
2024-04-24 08:15:19,492 INFO  [RpcServer.handler=20,port=60000] 
master.AssignmentManager: Skip assigning 
c,\x92I$\x92I$\x92I$\x92I$\x92I$\x90,1330810355551.d31a033cd7810e347639e12833969754.,
 it's host 0:0:0:0:0:0:0:0,60020,1713708572030 is dead but not processed yet

Any ideas how to get these regions assigned?

Thanks,
Mike

Reply via email to