I use hadoop-2.0.5, and QJM for HA. When Standby NameNode do checkpoint,there are below exception in Standby NameNode: 2013-08-01 13:43:07,965 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering checkpoint because there have been 763426 txns since the last checkpoint, wh ich exceeds the configured threshold 40000 2013-08-01 13:43:07,966 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file /home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000048708235 usi ng no compression 2013-08-01 13:43:37,405 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1504089705 saved in 29 seconds. 2013-08-01 13:43:37,410 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 47944809 2013-08-01 13:43:37,410 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging old image FSImageFile(file=/home/musa.ll/hadoop2/cluster-data/name/current/f simage_0000000000047222679, cpktTxId=0000000000047222679) 2013-08-01 13:43:37,723 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [10.232.98.61:20022, 10.232.98.62:20022, 10.232.98.63: 20022, 10.232.98.64:20022, 10.232.98.65:20022]. Skipping. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 3/5. 4 exceptions thrown: 10.232.98.62:20022: Asked for firstTxId 46944810 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000046630461-0000000000047222679 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) "hadoop-musa.ll-namenode-dw78.kgb.sqa.cm4.log" 350842L, 60353971C 348726,1 99% 2013-08-01 14:28:07,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 26.08s at 0.00 KB/s 2013-08-01 14:28:07,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 60835762 to namenode at 10.232.98.77:20021 2013-08-01 14:29:05,203 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode /10.232.98.77:20020 2013-08-01 14:29:06,242 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 137678/567332 transactions completed. (24%) 2013-08-01 14:29:07,243 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 275618/567332 transactions completed. (49%) 2013-08-01 14:29:08,244 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 407627/567332 transactions completed. (72%) 2013-08-01 14:29:09,245 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 545153/567332 transactions completed. (96%) 2013-08-01 14:29:20,146 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 567332 edits starting from txid 60835762 2013-08-01 14:30:44,411 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1950604672 saved in 37 seconds. 2013-08-01 14:30:44,416 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 60835762 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 3/5. 4 exceptions thrown: 10.232.98.62:20022: Asked for firstTxId 59835763 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)
10.232.98.63:20022: Asked for firstTxId 59835763 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:226) 2013-08-01 14:28:07,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 26.08s at 0.00 KB/s 2013-08-01 14:28:07,051 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 60835762 to namenode at 10.232.98.77:20021 2013-08-01 14:29:05,203 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log roll on remote NameNode /10.232.98.77:20020 2013-08-01 14:29:06,242 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 137678/567332 transactions completed. (24%) 2013-08-01 14:29:07,243 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 275618/567332 transactions completed. (49%) 2013-08-01 14:29:08,244 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 407627/567332 transactions completed. (72%) 2013-08-01 14:29:09,245 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log: 545153/567332 transactions completed. (96%) 2013-08-01 14:29:20,146 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 567332 edits starting from txid 60835762 2013-08-01 14:30:44,411 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size 1950604672 saved in 37 seconds. 2013-08-01 14:30:44,416 INFO org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to retain 2 images with txid >= 60835762 org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 3/5. 4 exceptions thrown: 10.232.98.62:20022: Asked for firstTxId 59835763 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) 10.232.98.63:20022: Asked for firstTxId 59835763 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at java.security.AccessController.doPrivileged(Native Method) at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) 10.232.98.64:20022: Asked for firstTxId 59835763 which is in the middle of file /home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590 at org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183) at org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628) at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180) at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203) at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735) at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213) at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:455) at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1130) at org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:111) at org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:946) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:931) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:868) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:165) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1100(StandbyCheckpointer.java:53) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:297) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$300(StandbyCheckpointer.java:210) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:230) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:226) 2013-08-01 14:30:44,799 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection to http://10.232.98.77:20021/getimage?putimage=1&txid=61403094&port=20021&s torageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600 2013-08-01 14:31:15,974 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took 31.18s at 0.00 KB/s 2013-08-01 14:31:15,974 INFO org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with txid 61403094 to namenode at 10.232.98.77:20021 How can I handle the exception? Thanks, LiuLei