Hi Solin, The timeout messages are usually a consequence of other issues on the connectivity between the Namenode and the QJM. Assuming Regionservers are configured properly to HDFS HA, pointing to an HDFS nameservice instead of a direct namenode address, it should also be resilient to a failover.
Considering the the Zookeeper session timeout message on the Regionserver log below, I would look first for a NW issue on the cluster, but it's just an initial guess: … > 2015-12-09 04:11:35,413 INFO org.apache.zookeeper.ClientCnxn: Unable to > reconnect to ZooKeeper service, session 0x44e6c2f20980003 has expired, > closing socket connection ... > On 15 Dec 2015, at 01:17, Colin Kincaid Williams <disc...@uw.edu> wrote: > > We had a namenode go down due to timeout with the hdfs ha qjm journal: > > > > 2015-12-09 04:10:42,723 WARN > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016 > ms (timeout=20000 ms) for a response for sendEdits > > 2015-12-09 04:10:43,708 FATAL > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for > required journal (JournalAndStream(mgr=QJM to [10.42.28.221:8485, > 10.42.28.222:8485, 10.42.28.223:8485], stream=QuorumOutputStream starting > at txid 8781293)) > > java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to > respond. > > at > org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137) > > at > org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) > > at > org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) > > at > org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) > > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490) > > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350) > > at > org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55) > > at > org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486) > > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1695) > > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1669) > > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:409) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205) > > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) > > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) > > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > > While this is disturbing in it's own right, I'm further annoyed that HBASE > shut down 2 region servers. Furthermore, we had to hbck -fixAssignments to > repair HBASE, and I'm not sure that the data from the shutdown regions was > available, and if our hbase service itself was available afterwards: > > > 2015-12-09 04:10:44,320 ERROR org.apache.hadoop.hbase.master.HMaster: > Region server ^@^@hbase008r09.comp.prod.local,60020,1436412712133 reported > a fatal error: > > ABORTING region server hbase008r09.comp.prod.local,60020,1436412712133: IOE > in log roller > > Cause: > > java.io.IOException: cannot get log writer > > at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716) > > at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663) > > at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595) > > at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > > at java.lang.Thread.run(Thread.java:722) > > Caused by: java.io.IOException: java.io.IOException: Failed on local > exception: java.io.IOException: Response is null.; Host Details : local > host is: "hbase008r09.comp.prod.local/10.42.28.192"; destination host is: > "hbasenn001.comp.prod.local":8020; > > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106) > > at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713) > > ... 4 more > > Caused by: java.io.IOException: Failed on local exception: > java.io.IOException: Response is null.; Host Details : local host is: > "hbase008r09.comp.prod.local/10.42.28.192"; destination host is: > "hbasenn001.comp.prod.local":8020; > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) > > at org.apache.hadoop.ipc.Client.call(Client.java:1228) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > > at com.sun.proxy.$Proxy14.create(Unknown Source) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192) > > at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:601) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > > at com.sun.proxy.$Proxy15.create(Unknown Source) > > at > org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298) > > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317) > > at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264) > > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97) > > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53) > > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554) > > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663) > > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660) > > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > > at org.apache.hadoop.fs.FileContext.create(FileContext.java:660) > > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502) > > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469) > > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:601) > > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87) > > ... 5 more > > Caused by: java.io.IOException: Response is null. > > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835) > > > 2015-12-09 04:10:44,387 ERROR org.apache.hadoop.hbase.master.HMaster: > Region server ^@^@hbase007r08.comp.prod.local,60020,1436412674179 reported > a fatal error: > > ABORTING region server hbase007r08.comp.prod.local,60020,1436412674179: IOE > in log roller > > Cause: > > java.io.IOException: cannot get log writer > > at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716) > > at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663) > > at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595) > > at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94) > > at java.lang.Thread.run(Thread.java:722) > > Caused by: java.io.IOException: java.io.IOException: Failed on local > exception: java.io.IOException: Response is null.; Host Details : local > host is: "hbase007r08.comp.prod.local/10.42.28.191"; destination host is: > "hbasenn001.comp.prod.local":8020; > > > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106) > > at > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713) > > ... 4 more > > Caused by: java.io.IOException: Failed on local exception: > java.io.IOException: Response is null.; Host Details : local host is: > "hbase007r08.comp.prod.local/10.42.28.191"; destination host is: > "hbasenn001.comp.prod.local":8020; > > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759) > > at org.apache.hadoop.ipc.Client.call(Client.java:1228) > > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) > > at com.sun.proxy.$Proxy14.create(Unknown Source) > > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192) > > at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:601) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) > > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) > > at com.sun.proxy.$Proxy15.create(Unknown Source) > > at > org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298) > > at > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317) > > at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264) > > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97) > > at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53) > > at > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554) > > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663) > > at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660) > > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333) > > at org.apache.hadoop.fs.FileContext.create(FileContext.java:660) > > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502) > > at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469) > > at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:601) > > at > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87) > > ... 5 more > > Caused by: java.io.IOException: Response is null. > > at > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940) > > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835) > > > 2015-12-09 04:11:01,444 INFO org.apache.zookeeper.ClientCnxn: Client > session timed out, have not heard from server in 26679ms for sessionid > 0x44e6c2f20980003, closing socket connection and attempting reconnect > > 2015-12-09 04:11:34,636 WARN > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > getListing of class ClientNamenodeProtocolTranslatorPB. Trying to fail over > immediately. > > 2015-12-09 04:11:34,687 WARN > org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking > getListing of class ClientNamenodeProtocolTranslatorPB after 1 fail over > attempts. Trying to fail over after sleeping for 791ms. > > 2015-12-09 04:11:35,334 WARN org.apache.hadoop.ipc.HBaseServer: > (responseTooSlow): > {"processingtimems":50237,"call":"reportRSFatalError([B@3c97e50c, ABORTING > region server hbase008r09.comp.prod.local,60020,1436412712133: IOE in log > roller\nCause:\njava.io.IOException: cannot get log writer\n\tat > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)\n\tat > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)\n\tat > org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)\n\tat > org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)\n\tat > java.lang.Thread.run(Thread.java:722)\nCaused by: java.io.IOException: > java.io.IOException: Failed on local exception: java.io.IOException: > Response is null.; Host Details : local host is: > \"hbase008r09.comp.prod.local/10.42.28.192\"; destination host is: > \"hbasenn001.comp.prod.local\":8020; \n\tat > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)\n\tat > org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)\n\t... > 4 more\nCaused by: java.io.IOException: Failed on local exception: > java.io.IOException: Response is null.; Host Details : local host is: > \"hbase008r09.comp.prod.local/10.42.28.192\"; destination host is: > \"hbasenn001.comp.prod.local\":8020; \n\tat > org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)\n\tat > org.apache.hadoop.ipc.Client.call(Client.java:1228)\n\tat > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)\n\tat > com.sun.proxy.$Proxy14.create(Unknown Source)\n\tat > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)\n\tat > sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)\n\tat > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat > java.lang.reflect.Method.invoke(Method.java:601)\n\tat > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)\n\tat > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)\n\tat > com.sun.proxy.$Proxy15.create(Unknown Source)\n\tat > org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)\n\tat > org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)\n\tat > org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)\n\tat > org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)\n\tat > org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)\n\tat > org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)\n\tat > org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)\n\tat > org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)\n\tat > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)\n\tat > org.apache.hadoop.fs.FileContext.create(FileContext.java:660)\n\tat > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)\n\tat > org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)\n\tat > sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)\n\tat > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat > java.lang.reflect.Method.invoke(Method.java:601)\n\tat > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)\n\t... > 5 more\nCaused by: java.io.IOException: Response is null.\n\tat > org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)\n\tat > org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)\n), rpc > version=1, client version=29, methodsFingerPrint=-525182806","client":" > 10.42.28.192:52162 > ","starttimems":1449659444320,"queuetimems":0,"class":"HMaster","responsesize":0,"method":"reportRSFatalError"} > > 2015-12-09 04:11:35,409 INFO org.apache.zookeeper.ClientCnxn: Opening > socket connection to server hbase004r08.comp.prod.local/10.42.28.188:2181. > Will not attempt to authenticate using SASL (Unable to locate a login > configuration) > > 2015-12-09 04:11:35,411 INFO org.apache.zookeeper.ClientCnxn: Socket > connection established to hbase004r08.comp.prod.local/10.42.28.188:2181, > initiating session > > 2015-12-09 04:11:35,413 INFO org.apache.zookeeper.ClientCnxn: Unable to > reconnect to ZooKeeper service, session 0x44e6c2f20980003 has expired, > closing socket connection > > 2015-12-09 04:11:35,413 FATAL org.apache.hadoop.hbase.master.HMaster: > Master server abort: loaded coprocessors are: [] > > 2015-12-09 04:11:35,414 INFO org.apache.hadoop.hbase.master.HMaster: > Primary Master trying to recover from ZooKeeper session expiry. > > 2015-12-09 04:11:35,416 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, > connectString=hbase004r08.comp.prod.local:2181,hbase003r07.comp.prod.local:2181,hbase005r09.comp.prod.local:2181 > sessionTimeout=1200000 watcher=master:60000 > > > ... > > > and eventually: > > > 2015-12-09 04:11:46,724 ERROR org.apache.zookeeper.ClientCnxn: Caught > unexpected throwable > > 2015-12-09 04:11:46,724 ERROR org.apache.zookeeper.ClientCnxn: Caught > unexpected throwable > > java.lang.StackOverflowError > > at java.security.AccessController.doPrivileged(Native Method) > > at java.io.PrintWriter.<init>(PrintWriter.java:78) > > at java.io.PrintWriter.<init>(PrintWriter.java:62) > > at > org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58) > > at > org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87) > > at > org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413) > > at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313) > > at > org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) > > at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) > > at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) > > at > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) > > at org.apache.log4j.Category.callAppenders(Category.java:206) > > at org.apache.log4j.Category.forcedLog(Category.java:391) > > at org.apache.log4j.Category.log(Category.java:856) > > at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:623) > > at > org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477) > > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640) > > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658) > > at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286) > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975) > > at > org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627) > > at > org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96) > > at > org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619) > > at > org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477) > > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640) > > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658) > > at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286) > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975) > > at > org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627) > > at > org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96) > > at > org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619) > > at > org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477) > > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640) > > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658) > > at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286) > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975) > > at > org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627) > > at > org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96) > > at > org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619) > > at > org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477) > > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640) > > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658) > > at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286) > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975) > > at > org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627) > > at > org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96) > > at > org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106) > > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619) > > at > org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477) > > at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640) > > at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658) > > at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286) > > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975) > > at > org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627) > > at > org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96) > > > ... > > > Since the namenode failover made the other nameserver active, then why did > my region servers decide to shutdown? The HDFS service seems to have stayed > up. Then how can I make the HBASE service more resilient to namenode > failovers? > > > Hbase: Version 0.92.1-cdh4.1.3 > > > Hadoop: Hadoop 2.0.0-cdh4.1.3