Hello. My environment is: HDFS 0.21, NameNode + BackupNode. After some time Backup node crashed with an exception (stack trace below). Problem #1 - the process did not exit. I've tried to run Secondary node to perform checkout. Got similar crash, but it did exit. Backed up my data and restarted name node. Same crash: 2011-06-17 10:12:39,985 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1019) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:483) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:270) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:433) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:421) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
The thing that helped was cleaning edits file with shell printf someone've sent to the list few days ago. But it would be great to have an option either to stop on first replay error or to skip such an errors (by looking into code it tried to apply date to unexisting file). In any case this would give later state then simply cleaning edits file. I have backup of the name node data that crashes, if someone is interested. As for me, this is a bug. I even though to put it to Jira, but found out https://issues.apache.org/jira/browse/HDFS-1864 (see last comment). Is it true that bug reports should not go to Jira? -- Best regards, Vitalii Tymchyshyn