However continuing with the process my QJM eventually error'd out and my Active NameNode went down.
2014-07-31 20:59:33,944 WARN [Logger channel to rhel6.local/ 10.120.5.247:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed to write txns 9635-9635. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1224) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-07-31 20:59:33,954 WARN [Logger channel to rhel1.local/ 10.120.5.203:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.203:8485 failed to write txns 9635-9635. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1224) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-07-31 20:59:33,975 WARN [Logger channel to rhel2.local/ 10.120.5.25:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.25:8485 failed to write txns 9635-9635. Will try to write to this JN again after the next log roll. org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.ipc.Client.call(Client.java:1224) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.journal(Unknown Source) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:156) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:354) at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:347) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2014-07-31 20:59:33,976 FATAL [IPC Server handler 5 on 8020] namenode.FSEditLog (JournalSet.java:mapJournalsAndReportErrors(355)) - Error: flush failed for required journal (JournalAndStream(mgr=QJM to [ 10.120.5.203:8485, 10.120.5.247:8485, 10.120.5.25:8485], stream=QuorumOutputStream starting at txid 9634)) org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown: 10.120.5.25:8485: IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 10.120.5.203:8485: IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 10.120.5.247:8485: IPC's epoch 5 is not the current writer epoch 0 at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:430) at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:331) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:142) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:132) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14018) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213) at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142) at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113) at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490) at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350) at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55) at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:946) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:884) at org.apache.hadoop.hdfs.server.namenode.FSImage.rollEditLog(FSImage.java:1013) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4436) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:734) at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:129) at org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8762) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) 2014-07-31 20:59:33,976 WARN [IPC Server handler 5 on 8020] client.QuorumJournalManager (QuorumOutputStream.java:abort(72)) - Aborting QuorumOutputStream starting at txid 9634 2014-07-31 20:59:33,978 INFO [IPC Server handler 5 on 8020] util.ExitUtil (ExitUtil.java:terminate(87)) - Exiting with status 1 2014-07-31 20:59:33,982 INFO [Thread-0] namenode.NameNode (StringUtils.java:run(615)) - SHUTDOWN_MSG: On Thu, Jul 31, 2014 at 6:08 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > I tried a third time and it just worked? > > sudo hdfs zkfc -formatZK > 2014-07-31 18:07:51,595 INFO [main] tools.DFSZKFailoverController > (DFSZKFailoverController.java:<init>(140)) - Failover controller configured > for NameNode NameNode at rhel1.local/10.120.5.203:8020 > 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:zookeeper.version=3.4.3-cdh4.1.3--1, built on 01/27/2013 00:13 > GMT > 2014-07-31 18:07:51,791 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:host.name=rhel1.local > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.version=1.7.0_60 > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.vendor=Oracle > Corporation > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:java.home=/usr/java/jdk1.7.0_60/jre > 2014-07-31 18:07:51,792 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:java.class.path=/etc/hadoop/conf:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/log4j-1.2.17.jar:/usr/lib/hadoop/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop/lib/junit-4.8.2.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/jersey-core-1.8.jar:/usr/lib/hadoop-hdfs/lib/commons-cli-1.2.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/asm-3.2.jar:/usr/lib/hadoop-hdfs/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/xmlenc-0.52.jar:/usr/lib/hadoop-hdfs/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/jsp-api-2.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/guava-11.0.2.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/lib/servlet-api-2.5.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.17.jar:/usr/lib/hadoop-hdfs/lib/jersey-server-1.8.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/lib/commons-el-1.0.jar:/usr/lib/hadoop-hdfs/lib/commons-io-2.1.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-hdfs/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-hdfs/lib/commons-lang-2.5.jar:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/commons-codec-1.4.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.4.Final.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.17.jar:/usr/lib/hadoop-yarn/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-tests-2.0.0-cdh4.1.3-tests.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.1.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-0.20-mapreduce/./:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20-mapreduce/lib/asm-3.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsr305-1.3.9.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jsp-api-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/guava-11.0.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/servlet-api-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/log4j-1.2.17.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-configuration-1.6.jar:/usr/lib/hadoop-0.20-mapreduce/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-digester-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/activation-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jetty-util-6.1.26.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-lang-2.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/avro-compiler-1.7.1.cloudera.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/snappy-java-1.0.4.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/junit-4.8.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/stax-api-1.0.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-math-2.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/mockito-all-1.8.5.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jettison-1.1.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20-mapreduce/lib/paranamer-2.3.jar:/usr/lib/hadoop-0.20-mapreduce/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-0.20-mapreduce/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20-mapreduce/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20-mapreduce/lib/hadoop-fairscheduler-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-ant.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-core.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-test-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-tools-2.0.0-mr1-cdh4.1.3.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-examples.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-tools.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-2.0.0-mr1-cdh4.1.3-test.jar:/usr/lib/hadoop-0.20-mapreduce/.//hadoop-core.jar > 2014-07-31 18:07:51,793 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:java.library.path=//usr/lib/hadoop/lib/native > 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.io.tmpdir=/tmp > 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:java.compiler=<NA> > 2014-07-31 18:07:51,801 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:os.name=Linux > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:os.arch=amd64 > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:os.version=2.6.32-358.el6.x86_64 > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:user.name=root > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client environment:user.home=/root > 2014-07-31 18:07:51,802 INFO [main] zookeeper.ZooKeeper > (Environment.java:logEnv(100)) - Client > environment:user.dir=/etc/hbase/conf.golden_apple > 2014-07-31 18:07:51,813 INFO [main] zookeeper.ZooKeeper > (ZooKeeper.java:<init>(433)) - Initiating client connection, > connectString=rhel1.local:2181,rhel6.local:2181,rhel2.local:2181 > sessionTimeout=5000 watcher=null > 2014-07-31 18:07:51,833 INFO [main-SendThread(rhel1.local:2181)] > zookeeper.ClientCnxn (ClientCnxn.java:logStartConnect(958)) - Opening > socket connection to server rhel1.local/10.120.5.203:2181. Will not > attempt to authenticate using SASL (unknown error) > 2014-07-31 18:07:51,844 INFO [main-SendThread(rhel1.local:2181)] > zookeeper.ClientCnxn (ClientCnxn.java:primeConnection(850)) - Socket > connection established to rhel1.local/10.120.5.203:2181, initiating > session > 2014-07-31 18:07:51,852 INFO [main-SendThread(rhel1.local:2181)] > zookeeper.ClientCnxn (ClientCnxn.java:onConnected(1187)) - Session > establishment complete on server rhel1.local/10.120.5.203:2181, sessionid > = 0x1478902fddc000a, negotiated timeout = 5000 > =============================================== > The configured parent znode /hadoop-ha/golden-apple already exists. > Are you sure you want to clear all failover information from > ZooKeeper? > WARNING: Before proceeding, ensure that all HDFS services and > failover controllers are stopped! > =============================================== > Proceed formatting /hadoop-ha/golden-apple? (Y or N) 2014-07-31 > 18:07:51,858 INFO [main-EventThread] ha.ActiveStandbyElector > (ActiveStandbyElector.java:processWatchEvent(538)) - Session connected. > Y > 2014-07-31 18:08:00,439 INFO [main] ha.ActiveStandbyElector > (ActiveStandbyElector.java:clearParentZNode(314)) - Recursively deleting > /hadoop-ha/golden-apple from ZK... > 2014-07-31 18:08:00,506 INFO [main] ha.ActiveStandbyElector > (ActiveStandbyElector.java:clearParentZNode(327)) - Successfully deleted > /hadoop-ha/golden-apple from ZK. > 2014-07-31 18:08:00,541 INFO [main] ha.ActiveStandbyElector > (ActiveStandbyElector.java:ensureParentZNode(299)) - Successfully created > /hadoop-ha/golden-apple in ZK. > 2014-07-31 18:08:00,545 INFO [main-EventThread] zookeeper.ClientCnxn > (ClientCnxn.java:run(511)) - EventThread shut down > 2014-07-31 18:08:00,545 INFO [main] zookeeper.ZooKeeper > (ZooKeeper.java:close(679)) - Session: 0x1478902fddc000a closed > > > > On Thu, Jul 31, 2014 at 2:51 PM, Alex Newman <posi...@gmail.com> wrote: > >> Cheers. That's rough. We don't have that problem here at WanDISCO. >> >> On Thu, Jul 31, 2014 at 12:46 PM, Colin Kincaid Williams <disc...@uw.edu> >> wrote: >> > Hi this is drocsid / discord from #hbase. Thanks for the help earlier >> today. >> > Just thought I'd forward this info regarding swapping out the NameNode >> in a >> > QJM / HA configuration. See you around on #hbase. If you visit Seattle, >> feel >> > free to give me a shout out. >> > >> > ---------- Forwarded message ---------- >> > From: Colin Kincaid Williams <disc...@uw.edu> >> > Date: Thu, Jul 31, 2014 at 12:35 PM >> > Subject: Re: Juggling or swaping out the standby NameNode in a QJM / HA >> > configuration >> > To: user@hadoop.apache.org >> > >> > >> > Hi Jing, >> > >> > Thanks for the response. I will try this out, and file an Apache jira. >> > >> > Best, >> > >> > Colin Williams >> > >> > >> > On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao <j...@hortonworks.com> >> wrote: >> >> >> >> Hi Colin, >> >> >> >> I guess currently we may have to restart almost all the >> >> daemons/services in order to swap out a standby NameNode (SBN): >> >> >> >> 1. The current active NameNode (ANN) needs to know the new SBN since in >> >> the current implementation the SBN tries to send rollEditLog RPC >> request to >> >> ANN periodically (thus if a NN failover happens later, the original ANN >> >> needs to send this RPC to the correct NN). >> >> 2. Looks like the DataNode currently cannot do real refreshment for NN. >> >> Look at the code in BPOfferService: >> >> >> >> void refreshNNList(ArrayList<InetSocketAddress> addrs) throws >> >> IOException { >> >> Set<InetSocketAddress> oldAddrs = Sets.newHashSet(); >> >> for (BPServiceActor actor : bpServices) { >> >> oldAddrs.add(actor.getNNSocketAddress()); >> >> } >> >> Set<InetSocketAddress> newAddrs = Sets.newHashSet(addrs); >> >> >> >> if (!Sets.symmetricDifference(oldAddrs, newAddrs).isEmpty()) { >> >> // Keep things simple for now -- we can implement this at a later >> >> date. >> >> throw new IOException( >> >> "HA does not currently support adding a new standby to a >> running >> >> DN. " + >> >> "Please do a rolling restart of DNs to reconfigure the list >> of >> >> NNs."); >> >> } >> >> } >> >> >> >> 3. If you're using automatic failover, you also need to update the >> >> configuration of the ZKFC on the current ANN machine, since ZKFC will >> do >> >> gracefully fencing by sending RPC to the other NN. >> >> 4. Looks like we do not need to restart JournalNodes for the new SBN >> but I >> >> have not tried before. >> >> >> >> Thus in general we may still have to restart all the services >> (except >> >> JNs) and update their configurations. But this may be a rolling restart >> >> process I guess: >> >> 1. Shutdown the old SBN, bootstrap the new SBN, and start the new SBN. >> >> 2. Keep the ANN and its corresponding ZKFC running, do a rolling >> restart >> >> of all the DN to update their configurations >> >> 3. After restarting all the DN, stop ANN and the ZKFC, and update their >> >> configuration. The new SBN should become active. >> >> >> >> I have not tried the upper steps, thus please let me know if this >> >> works or not. And I think we should also document the correct steps in >> >> Apache. Could you please file an Apache jira? >> >> >> >> Thanks, >> >> -Jing >> >> >> >> >> >> >> >> On Thu, Jul 31, 2014 at 9:37 AM, Colin Kincaid Williams < >> disc...@uw.edu> >> >> wrote: >> >>> >> >>> Hello, >> >>> >> >>> I'm trying to swap out a standby NameNode in a QJM / HA >> configuration. I >> >>> believe the steps to achieve this would be something similar to: >> >>> >> >>> Use the Bootstrap standby command to prep the replacment standby. Or >> >>> rsync if the command fails. >> >>> >> >>> Somehow update the datanodes, so they push the heartbeat / journal to >> the >> >>> new standby >> >>> >> >>> Update the xml configuration on all nodes to reflect the replacment >> >>> standby. >> >>> >> >>> Start the replacment standby >> >>> >> >>> Use some hadoop command to refresh the datanodes to the new NameNode >> >>> configuration. >> >>> >> >>> I am not sure how to deal with the Journal switch, or if I am going >> about >> >>> this the right way. Can anybody give me some suggestions here? >> >>> >> >>> >> >>> Regards, >> >>> >> >>> Colin Williams >> >>> >> >> >> >> >> >> CONFIDENTIALITY NOTICE >> >> NOTICE: This message is intended for the use of the individual or >> entity >> >> to which it is addressed and may contain information that is >> confidential, >> >> privileged and exempt from disclosure under applicable law. If the >> reader of >> >> this message is not the intended recipient, you are hereby notified >> that any >> >> printing, copying, dissemination, distribution, disclosure or >> forwarding of >> >> this communication is strictly prohibited. If you have received this >> >> communication in error, please contact the sender immediately and >> delete it >> >> from your system. Thank You. >> > >> > >> > >> > >