Hi, the HMaster died as well as regionservers, below is hmaster's log. could you please find what's problem?
2012-10-12 00:14:19,444 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181, initiating session 2012-10-12 00:14:19,520 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181, sessionid = 0x139c539bc090002, negotiated timeout = 40000 2012-10-12 00:14:23,738 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 15046ms for sessionid 0x239c539ba630001, closing socket connection and attempting reconnect 2012-10-12 00:14:24,246 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181 2012-10-12 00:14:25,173 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 15245ms for sessionid 0x139c539bc090003, closing socket connection and attempting reconnect 2012-10-12 00:14:25,328 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181 2012-10-12 00:14:25,328 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, initiating session 2012-10-12 00:14:25,507 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-10-12 00:14:25,507 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x139c539bc090003 has expired, closing socket connection 2012-10-12 00:14:27,247 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, initiating session 2012-10-12 00:14:27,248 WARN org.apache.zookeeper.ClientCnxn: Session 0x239c539ba630001 for server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) at sun.nio.ch.IOUtil.read(IOUtil.java:186) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:859) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1157) 2012-10-12 00:14:28,026 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181 2012-10-12 00:14:41,359 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 14007ms for sessionid 0x239c539ba630001, closing socket connection and attempting reconnect 2012-10-12 00:14:41,592 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-4-hbase-stor-1/ 10.20.16.32:2181 2012-10-12 00:14:46,186 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 26666ms for sessionid 0x139c539bc090002, closing socket connection and attempting reconnect 2012-10-12 00:14:46,572 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181 2012-10-12 00:14:46,572 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, initiating session 2012-10-12 00:14:46,726 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181, sessionid = 0x139c539bc090002, negotiated timeout = 40000 2012-10-12 00:14:54,925 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 13464ms for sessionid 0x239c539ba630001, closing socket connection and attempting reconnect 2012-10-12 00:14:56,524 ERROR org.apache.hadoop.hbase.master.HMaster: Region server serverName=bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673, load=(requests=75, regions=1, usedHeap=162, maxHeap=9725) reported a fatal error: ABORTING region server serverName=bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2,60020,1347901025673, load=(requests=75, regions=1, usedHeap=162, maxHeap=9725): regionserver:60020-0x339c539ba640003 regionserver:60020-0x339c539ba640003 received expired from ZooKeeper, aborting Cause: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:353) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) 2012-10-12 00:14:56,813 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-3-hbase-stor-2/ 10.20.16.33:2181 2012-10-12 00:15:10,147 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 15119ms for sessionid 0x239c539ba630001, closing socket connection and attempting reconnect 2012-10-12 00:15:10,625 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181 2012-10-12 00:15:10,625 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to bj-ecsxhm4f3I-r3-5-r810-2-hbase-stor-3/ 10.20.16.34:2181, initiating session 2012-10-12 00:15:10,750 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x239c539ba630001 has expired, closing socket connection 2012-10-12 00:15:10,750 FATAL org.apache.hadoop.hbase.master.HMaster: master:60000-0x239c539ba630001 master:60000-0x239c539ba630001 received expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:353) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507) 2012-10-12 00:15:10,751 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2012-10-12 00:15:10,751 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2012-10-12 00:15:11,392 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads 2012-10-12 00:15:11,392 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 60000 2012-10-12 00:15:11,392 INFO org.apache.hadoop.hbase.master.CatalogJanitor: bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000-CatalogJanitor exiting 2012-10-12 00:15:11,392 INFO org.apache.hadoop.hbase.master.HMaster$2: bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000-BalancerChore exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 0 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 11 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 6 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 9 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server listener on 60000 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 3 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 7 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 5 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.hbase.master.HMaster: Stopping infoServer 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 20 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 23 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 19 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 25 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 29 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 18 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 15 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC Server Responder 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 16 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 37 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 40 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 41 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 46 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 47 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 50 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 51 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 12 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 14 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 13 on 60000: exiting 2012-10-12 00:15:11,393 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 10 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 59 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000.timeoutMonitor exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 53 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 54 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 58 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 57 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 56 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 55 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 52 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 49 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 48 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 44 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 43 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 45 on 60000: exiting 2012-10-12 00:15:11,395 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 42 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 39 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 38 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 35 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 36 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 34 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 2 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 17 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 4 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 32 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 8 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 33 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.mortbay.log: Stopped SelectChannelConnector@0.0.0.0:60010 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 30 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 31 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 28 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 27 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 26 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 24 on 60000: exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.hbase.master.LogCleaner: master-bj-ecsxhm4f3I-r3-7-r810-3-hbase-stor-6:60000.oldLogCleaner exiting 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 22 on 60000: exiting 2012-10-12 00:15:11,398 INFO org.apache.hadoop.hbase.replication.master.ReplicationLogCleaner: Stopping replicationLogCleaner-0x139c539bc090003 2012-10-12 00:15:11,394 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server handler 21 on 60000: exiting 2012-10-12 00:15:11,502 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x239c539ba630001 Unable to get data of znode /hbase/master org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) 2012-10-12 00:15:11,502 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x239c539ba630001 Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) 2012-10-12 00:15:11,503 ERROR org.apache.hadoop.hbase.master.ActiveMasterManager: master:60000-0x239c539ba630001 Error deleting our own master address node org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:549) at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:620) at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:197) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:310) 2012-10-12 00:15:11,503 DEBUG org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker org.apache.hadoop.hbase.catalog.CatalogTracker@36664140 2012-10-12 00:15:11,503 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: The connection to hconnection-0x139c539bc090002-0x139c539bc090002-0x139c539bc090002 has been closed. 2012-10-12 00:15:11,503 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: The connection to hconnection-0x139c539bc090002-0x139c539bc090002-0x139c539bc090002 has been closed. 2012-10-12 00:15:11,503 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting Best R. beatls