We are running hadoop-0.20.1. I did not set this cluster up, and the person who did is unavailable, so I apologize for any of the following that is unclear.
We would like to (re)start a secondary namenode, and I am looking for guidance on how to do so. We have secondary namenode, but it has apparently never been able to contact the namenode. Or so it seems. The secondary namenode was never properly configured, and that includes logging, so unable to see any kind of logging from it. The same unfortunate log configuration issue exists on the namenode, and there is nothing to see there either. On the secondary name node, there are some files in the checkpoint directory but they don't seem to have any relationship to the files in the namenode's name dir. That all leads us to believe that there has never been a checkpoint taken or attempted. But the namenode's name dir *does* contain both edits and edits.new files. There are, in fact, 5 files in there. fsimage, fstime, VERSION, edits and edits.new. The edits file is only 4 bytes. edits.new is very large, as the cluster's been running for quite a while and has been at least somewhat active. So now the questions. Was there somehow a secondary name node that was trying to make a checkpoint and failed, and that's why both edits and edits.new exist? If we restart the name node, it will properly merge both edits and edits.new, correct? From reading on the Jira and browsing the source code a little, this is how I think it will happen. Of course, the real question is how to get a secondary name node going with as little risk as possible. Should we just start up the secondary name node? Or should we restart the name node first? Or is there some other way for us to get right with our cluster? Thanks, Charlie