Hi,

I have 2 regions transitionning from servers to servers for 15 minutes now.

I have nothing in the master logs about those 2 regions but on the region
server logs I have some files notfound2013-02-23 16:02:07,347 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open
of region=entry,theykey,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435.,
starting to roll back the global memstore size.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException:
File does not exist:
/hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
    at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:597)
    at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:510)
    at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4177)
    at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4125)
    at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:328)
    at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:100)
    at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: java.io.FileNotFoundException: File does
not exist:
/hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
    at
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:433)
    at org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:240)
    at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:3141)
    at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:572)
    at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:570)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    ... 3 more
Caused by: java.io.FileNotFoundException: File does not exist:
/hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
    at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843)
    at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834)
    at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
    at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:108)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
    at
org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:573)
    at
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init>(StoreFile.java:1261)
    at
org.apache.hadoop.hbase.io.HalfStoreFileReader.<init>(HalfStoreFileReader.java:70)
    at
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:508)
    at
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603)
    at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:409)
    at org.apache.hadoop.hbase.regionserver.Store$1.call(Store.java:404)
    ... 8 more
2013-02-23 16:02:07,370 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign:
regionserver:60020-0x13d07ec012501fc Attempt to transition the unassigned
node for 6dd77bc9ff91e0e6d413f74e670ab435 from RS_ZK_REGION_OPENING to
RS_ZK_REGION_FAILED_OPEN failed, the node existed but was version 6586 not
the expected version 6585


If I try hbck -fix, this is bringing the master down:
2013-02-23 16:03:01,419 INFO org.apache.hadoop.hbase.master.HMaster:
BalanceSwitch=false
2013-02-23 16:03:03,067 FATAL org.apache.hadoop.hbase.master.HMaster:
Master server abort: loaded coprocessors are: []
2013-02-23 16:03:03,068 FATAL org.apache.hadoop.hbase.master.HMaster:
Unexpected state :
entry,thekey,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435.
state=PENDING_OPEN, ts=1361653383067, server=node2,60020,1361653023303 ..
Cannot transit it to OFFLINE.
java.lang.IllegalStateException: Unexpected state :
entry,thekey,1361651769136.6dd77bc9ff91e0e6d413f74e670ab435.
state=PENDING_OPEN, ts=1361653383067, server=node2,60020,1361653023303 ..
Cannot transit it to OFFLINE.
    at
org.apache.hadoop.hbase.master.AssignmentManager.setOfflineInZooKeeper(AssignmentManager.java:1813)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1658)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1423)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1398)
    at
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1393)
    at
org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:1740)
    at org.apache.hadoop.hbase.master.HMaster.assign(HMaster.java:1731)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
    at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
2013-02-23 16:03:03,069 INFO org.apache.hadoop.hbase.master.HMaster:
Aborting
2013-02-23 16:03:03,069 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60000
2013-02-23 16:03:03,069 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
node3,60000,1361653064588-CatalogJanitor exiting
2013-02-23 16:03:03,069 INFO org.apache.hadoop.hbase.master.HMaster$2:
node3,60000,1361653064588-BalancerChore exiting
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 5 on 60000: exiting
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 4 on 60000: exiting
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 8 on 60000: exiting
2013-02-23 16:03:03,070 INFO
org.apache.hadoop.hbase.master.cleaner.HFileCleaner:
master-node3,60000,1361653064588.archivedHFileCleaner exiting
2013-02-23 16:03:03,070 INFO
org.apache.hadoop.hbase.master.cleaner.LogCleaner:
master-node3,60000,1361653064588.oldLogCleaner exiting
2013-02-23 16:03:03,070 INFO org.apache.hadoop.hbase.master.HMaster:
Stopping infoServer
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server Responder
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Server handler 1 on 60000: exiting
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Server handler 2 on 60000: exiting
2013-02-23 16:03:03,071 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
Responder, call isMasterRunning(), rpc version=1, client version=29,
methodsFingerPrint=891823089 from 192.168.23.7:43381: output error
2013-02-23 16:03:03,071 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60000 caught a ClosedChannelException, this means that the
server was processing a request but the client went away. The error message
was: null
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 3 on 60000: exiting
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60000: exiting
2013-02-23 16:03:03,071 INFO org.mortbay.log: Stopped
SelectChannelConnector@0.0.0.0:60010
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server Responder
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 6 on 60000: exiting
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 7 on 60000: exiting
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60000: exiting
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 2 on 60000: exiting
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
IPC Server listener on 60000
2013-02-23 16:03:03,071 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 9 on 60000: exiting
2013-02-23 16:03:03,070 INFO org.apache.hadoop.ipc.HBaseServer: REPL IPC
Server handler 0 on 60000: exiting
2013-02-23 16:03:03,287 INFO
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Closed zookeeper sessionid=0x33d07f1130301fe
2013-02-23 16:03:03,453 INFO
org.apache.hadoop.hbase.master.AssignmentManager$TimerUpdater:
node3,60000,1361653064588.timerUpdater exiting
2013-02-23 16:03:03,453 INFO
org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
node3,60000,1361653064588.timeoutMonitor exiting
2013-02-23 16:03:03,453 INFO
org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor:
node3,60000,1361653064588.splitLogManagerTimeoutMonitor exiting
2013-02-23 16:03:03,468 INFO org.apache.hadoop.hbase.master.HMaster:
HMaster main thread exiting
2013-02-23 16:03:03,469 ERROR
org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
java.lang.RuntimeException: HMaster Aborted
    at
org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160)
    at
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1927)

I'm running with 0.94.5 +
HBASE-7824<https://issues.apache.org/jira/browse/HBASE-7824>+
HBASE-7865 <https://issues.apache.org/jira/browse/HBASE-7865>. I don't
think the 2 patchs are related to this issue.

Hadoop fsck reports "The filesystem under path '/' is HEALTHY" without any
issue.
/hbase/entry/2ebfef593a3d715b59b85670909182c9/a/62b0aae45d59408dbcfc513954efabc7
does exist in the FS.

What I don't understand is why is the master going down? And how can I fix
that?

I will try to create the missing directory and see the results...

Thanks,

JM

Reply via email to