RE: NameNode failure and recovery!

Vijay Thakorlal Wed, 03 Apr 2013 07:57:04 -0700

Hi Rahul,


The SNN does not act as a backup / standby NameNode in the event of failure. 

 

The sole purpose of the Secondary NameNode (or as it’s otherwise / more 
correctly known as the Checkpoint Node) is to perform checkpointing of the 
current state of HDFS:

 

The SNN retrieves the fsimage and edits files from the NN 

The NN rolls the edits file

The SNN Loads the fsimage into memory 

Then the SNN replays the edits log file to merge the two

Then the SNN transfers the merged checkpoint back to the NN

The NN uses the checkpoint as the new fsimage file

 

It’s true that technically you could use the fsimage from the SNN if completely 
lost the NN – and yes as you said you would “lose” any changes to HDFS that 
occurred between the NN dieing and the last time the checkpoint occurred. But 
as mentioned the SNN is not a backup for the NN.

 

Regards,

Vijay

 

From: Rahul Bhattacharjee [mailto:rahul.rec....@gmail.com] 
Sent: 03 April 2013 15:40
To: user@hadoop.apache.org
Subject: NameNode failure and recovery!

 

Hi all,

I was reading about Hadoop and got to know that there are two ways to protect 
against the name node failures.

1) To write to a nfs mount along with the usual local disk.

 -or-

2) Use secondary name node. In case of failure of NN , the SNN can take in 
charge. 

My questions :-

1) SNN is always lagging , so when SNN becomes primary in event of a NN failure 
,  then the edits which have not been merged into the image file would be lost 
, so the system of SNN would not be consistent with the NN before its failure.

2) Also I have read that other purpose of SNN is to periodically merge the edit 
logs with the image file. In case a setup goes with option #1 (writing to NFS, 
no SNN) , then who does this merging.

 

Thanks,
Rahul

RE: NameNode failure and recovery!

Reply via email to