Hi All, I am running Hadoop 2.4.0. I am trying to restart my HA cluster but since there isn't a way to gracefully shutdown the NN (AFAIK), I am running into a (sort of) race condition. A client has issued a delete command and NN successfully deletes the requested file (in-progress edit logs across NN & JNs are updated and DN physically delete the blocks). But before the current in-progress edit log segment can be closed, the NN is stopped. Now when the NN is started again, it reads all edit logs from JNs but it does not consider the last in-progress edit log from the last run. Due to this NN is expecting more blocks to be reported than what the DNs have. Unfortunately sometimes this difference can be large enough (considering dfs.namenode.safemode.threshold-pct) to leave the NN in safemode forever.
This problem is looks to be generic to me. Can someone please confirm if this is indeed a bug or point out where I may be wrong (either in my process or understanding). I modified the NN code to also read the in-progress edit log from JNs and my problem was resolved. But I am not sure what implications this might have. Here is the code change I did: diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImag index e78153f..b864ec1 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java @@ -623,7 +623,7 @@ private boolean loadFSImage(FSNamesystem target, StartupOption startOpt, } editStreams = editLog.selectInputStreams( imageFiles.get(0).getCheckpointTxId() + 1, - toAtLeastTxId, recovery, false); + toAtLeastTxId, recovery, true); } else { editStreams = FSImagePreTransactionalStorageInspector .getEditLogStreams(storage); -- Regards Nitin Goyal