Questions on rollback/upgrade HDFS with QJM HA enabled

sam liu Sat, 24 Jan 2015 05:32:53 -0800

Hi Experts,

I have questions on rollback/upgrade HDFS with QJM HA enabled.


On the website
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled,
it says:
'To perform a rollback of an upgrade, both NNs should first be shut down.
The operator should run the roll back command on the NN where they
initiated the upgrade procedure, which will perform the rollback on the
local dirs there, as well as on the shared log, either NFS or on the JNs.
Afterward, this NN should be started and the operator should run
`-bootstrapStandby' on the other NN to bring the two NNs in sync with this
rolled-back file system state.'

Currently I expect the steps are(Please correct me if I am wrong):
NN1 -> hadoop namenode -rollback
NN1 -> hadoop namenode // In our env, this rollbacked namenode shuts down
right after it finishes -rollback so it needs to be started again.
NN2 -> hadoop namenode -bootstrapStandby
hadoop datanode -rollback // on all datanodes

[Question 1]:
One thing I don't know is when the JournalNodes should be started and/or
stopped. It seems like they should be started for the hadoop namenode
-rollback. Should they be restarted sometime?

[Question 2]:
Another issue actually happens after the upgrade and before rollback
starts: The standby NN process is actually heavily occupying the CPU and
somehow is eating up disk space (without the disk space actually being
used). This was causing "No space left on device" errors during the
rollback process.  As soon as I killed the namenode process, the disk space
was immediately back to a reasonable amount.
What might cause the NN process to occupy in a hidden way so much disk
space?

Thanks!

Questions on rollback/upgrade HDFS with QJM HA enabled

Reply via email to