Hi, Ted Yu: Thanks for your reply. The logs is indeed from the same regionserver. It was most careless of me, Sorry. Because the first logs is from another region. There's no doubts about the comments on the OpenRegionHandler#openRegion. While opening failure , the Master will assign it again, but the ZK node's in an unexpect state. So the re-assigning will not success. I paste the logs again:
Here's the logs from one Regionserver 2011-05-20 15:49:19,503 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. java.io.IOException: Exception occured while connecting to the server at com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker.retryOperation(RPCRetryAndSwitchInvoker.java:162) at com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker.handleFailure(RPCRetryAndSwitchInvoker.java:118) at com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker.invoke(RPCRetryAndSwitchInvoker.java:95) at $Proxy6.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:889) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:724) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:812) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:409) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:338) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2551) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2537) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:272) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2011-05-20 15:49:19,503 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. 2011-05-20 16:21:27,731 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open region: ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. 2011-05-20 16:21:27,731 DEBUG org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Processing open of ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. 2011-05-20 16:21:27,731 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x3300c164fe0002c Attempting to transition node d7555a12586e6c788ca55017224b5a51 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING 2011-05-20 16:21:27,732 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:20020-0x3300c164fe0002c Attempt to transition the unassigned node for d7555a12586e6c788ca55017224b5a51 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state RS_ZK_REGION_OPENING set by the server 157-5-111-11,20020,1305875930161 2011-05-20 16:21:27,732 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transition from OFFLINE to OPENING for region=d7555a12586e6c788ca55017224b5a51 2011-05-20 16:21:27,732 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Region was hijacked? It no longer exists, encodedName=d7555a12586e6c788ca55017224b5a51 Here's the Master logs ------TimeoutMonitor found the timeout region, assign it again but failed for each time. 2011-05-20 16:18:27,728 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. state=PENDING_OPEN, ts=1305879327726 2011-05-20 16:18:27,728 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been PENDING_OPEN for too long, reassigning region=ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. 2011-05-20 16:18:27,728 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; was=ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. state=PENDING_OPEN, ts=1305879327726 2011-05-20 16:18:27,728 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. so generated a random one; hri=ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51., src=, dest=157-5-111-12,20020,1305877626108; 4 (online=4, exclude=null) available servers 2011-05-20 16:18:27,728 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. so generated a random one; hri=ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51., src=, dest=157-5-111-12,20020,1305877626108; 4 (online=4, exclude=null) available servers 2011-05-20 16:18:27,728 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region ufdr,001570,1305873689710.d7555a12586e6c788ca55017224b5a51. to 157-5-111-12,20020,1305877626108 Regards, Jieshan Bean