RE: How to restart an HDFS standby namenode dead for a very long time

Brahma Reddy Battula Fri, 15 Jul 2016 00:21:45 -0700

Seems to be you are hitting following jira.. Please refer

https://issues.apache.org/jira/browse/HDFS-9917

--Brahma Reddy Battula

From: Zach Cox [mailto:zcox...@gmail.com]
Sent: 14 July 2016 03:34
To: user@hadoop.apache.org
Subject: How to restart an HDFS standby namenode dead for a very long time

Hi - we have an HDFS (version 2.0.0-cdh4.4.0) cluster setup in HA with 2 
namenodes and 5 journal nodes. This cluster has been somewhat neglected (long 
story) and the standby namenode process has been dead for several months.

Recently we tried to just start the standby namenode process again, but several 
hours later the entire HDFS cluster (and HBase on top of it) was unavailable 
for several hours. As soon as we stopped the standby namenode process, HDFS 
(and HBase) started working fine again. I don't know for sure, but I'm guessing 
the standby namenode was trying to catch up on several months of edits from 
being down for so long, and just couldn't do it.

We really need to get this standby namenode process started again, so I'm 
trying to find the right way to do it. I've tried starting it with the 
-bootstrapStandby option, but that appears broken in our HDFS version. Instead, 
we can manually rsync the files in the dfs.name.dir from the active namenode.

I guess my question is: is there a recommended way to get this standby namenode 
resurrected successfully? And would we need to do anything other than rsync 
dfs.name.dir from the active namenode before starting the standby namenode 
again?

Thanks,
Zach

RE: How to restart an HDFS standby namenode dead for a very long time

Reply via email to