Hello.

My environment is: HDFS 0.21, NameNode + BackupNode.
After some time Backup node crashed with an exception (stack trace below).
Problem #1 - the process did not exit.
I've tried to run Secondary node to perform checkout. Got similar crash, but
it did exit.
Backed up my data and restarted name node. Same crash:
2011-06-17 10:12:39,985 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode:
java.lang.NullPointerException
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1019)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:483)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:270)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:433)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:421)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)

The thing that helped was cleaning edits file with shell printf someone've
sent to the list few days ago.
But it would be great to have an option either to stop on first replay error
or to skip such an errors (by looking into code it tried to apply date to
unexisting file). In any case this would give later state then simply
cleaning edits file.
I have backup of the name node data that crashes, if someone is interested.
As for me, this is a bug. I even though to put it to Jira, but found out
https://issues.apache.org/jira/browse/HDFS-1864 (see last comment).
Is it true that bug reports should not go to Jira?
-- 
Best regards,
 Vitalii Tymchyshyn

Reply via email to