hadoop version 2.4.1
Namenode shutdown to become epoch number mismatch.
Why suddenly epoch numbers mismatch ?
Why suddenly namenode shutdown ?
*** namenode log
2014-08-26 12:17:48,625 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-08-26 12:17:48,646 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned
0 directive(s) and 0 block(s) in 21 millisecond(s).
2014-08-26 12:18:03,599 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
192.10.1.209
2014-08-26 12:18:03,599 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Rolling edit logs
2014-08-26 12:18:03,599 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Ending log segment 22795096
2014-08-26 12:18:03,633 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Number of transactions: 81 Total time for transactions(ms): 10 Number of
transactions batched in Syncs: 0 Number of syncs: 14 SyncTimes(ms): 159 197
2014-08-26 12:18:03,675 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file /data/dfs/name/current/edits_inprogress_0000000000022795096 ->
/data/dfs/name/current/edits_0000000000022795096-0000000000022795176
2014-08-26 12:18:03,675 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Starting log segment at 22795177
2014-08-26 12:18:05,419 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap
updated: 192.10.1.211:40010 is added to blk_1076515119_2774480 size 46226115
2014-08-26 12:18:05,420 INFO
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: BLOCK*
processOverReplicatedBlock: Postponing processing of over-replicated
blk_1076515119_2774480 since storage +
[DISK]DS-405679644-192.10.1.201-40010-1401163823536:NORMALdatanode
192.10.1.201:40010 does not yet have up-to-date block information.
2014-08-26 12:18:18,625 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-08-26 12:18:18,642 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned
0 directive(s) and 0 block(s) in 17 millisecond(s).
2014-08-26 12:18:48,625 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-08-26 12:18:48,642 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned
0 directive(s) and 0 block(s) in 17 millisecond(s).
2014-08-26 12:19:18,625 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-08-26 12:19:18,642 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned
0 directive(s) and 0 block(s) in 17 millisecond(s).
2014-08-26 12:19:48,625 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-08-26 12:19:48,642 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned
0 directive(s) and 0 block(s) in 17 millisecond(s).
2014-08-26 12:20:04,506 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
192.10.1.209
2014-08-26 12:20:04,506 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Rolling edit logs
2014-08-26 12:20:04,506 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Ending log segment 22795177
2014-08-26 12:20:04,507 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Number of transactions: 53 Total time for transactions(ms): 6 Number of
transactions batched in Syncs: 0 Number of syncs: 1 SyncTimes(ms): 20 67
2014-08-26 12:20:04,544 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Number of transactions: 53 Total time for transactions(ms): 6 Number of
transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 48 76
2014-08-26 12:20:04,587 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits
file /data/dfs/name/current/edits_inprogress_0000000000022795177 ->
/data/dfs/name/current/edits_0000000000022795177-0000000000022795229
2014-08-26 12:20:04,587 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Starting log segment at 22795230
2014-08-26 12:20:18,625 INFO
org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
Rescanning after 30000 milliseconds
2014-08-26 12:20:21,771 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started
for active state
2014-08-26 12:20:59,008 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Ending log segment 22795230
2014-08-26 12:20:59,009 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: NameNodeEditLogRoller was
interrupted, exiting
2014-08-26 12:20:59,014 WARN
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal
192.10.1.209:8485 failed to write txns 22795231-22795235. Will try to write to
this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 115 is
less than the last promised epoch 116
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)
at
org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at $Proxy12.journal(Unknown Source)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:167)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:357)
at
org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:350)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2014-08-26 12:20:59,017 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog:
Error: flush failed for required journal (JournalAndStream(mgr=QJM to
[10.1.1.107:8485, 192.10.1.208:8485, 192.10.1.209:8485],
stream=QuorumOutputStream starting at txid 22795230))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions
to achieve quorum size 2/3. 3 exceptions thrown:
192.10.1.208:8485: IPC's epoch 115 is less than the last promised epoch 116
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:414)
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:442)
at
org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:342)
at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:148)
at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:158)
at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25421)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
2014-08-26 12:20:59,018 WARN
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting
QuorumOutputStream starting at txid 22795230
2014-08-26 12:20:59,020 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2014-08-26 12:20:59,023 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at namenode194/192.10.1.208
************************************************************/
2014-08-26 12:25:43,591 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
STARTUP_MSG:
/************************************************************