I am a bit confused about the different options for namenode high availability (or something along those lines) in CDH4 (hadoop-2.0.0).

I understand that the secondary namenode is deprecated, and that there are two options to replace it: checkpoint or backup namenodes. Both are well explained in the documentation, but the confusion begins when reading about "HDFS High Availability", for example here: http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

Is the topic "HDFS High Availability" as described there (using shared storage) related to checkpoint/backup nodes. If so, in what way?

If I read about backup nodes, it also seems to be aimed at high availability. From what I understood, the current implementation doesn't provide (warm) fail-over yet, but this is planned. So starting to replace secondary namenodes now with backup namenodes sounds like a future proof idea?

thanks,
Jan

Reply via email to