Hi everybody,

most of the bugs/issues/etc.. that I found while upgrading from CDH 5
to BigTop 1.4 are fixed, I am now testing (as suggested also in here)
upgrade/rollback procedures for HDFS (all written in
https://phabricator.wikimedia.org/T244499, will add documentation
about this at the end I promise).

I initially followed [1][2] in my Test cluster, choosing the Rolling
upgrade, but when I tried to rollback (after days since the initial
upgrade) I ended up in an inconsistent state and I wasn't able to
recover the previous HDFS state. I didn't save the exact error
messages but the situation was more or less the following:

FS-Image-rollback (created at the time of the upgrade) - up to transaction X
FS-Image-current - up to transaction Y, with Y = X + 10000 (number
totally made up for the example)
QJM cluster: first available transaction Z = X + 10000 + 1

When I tried to rolling rollback, the Namenode complained about a hole
in the transaction log, namely at X + 1, so it refused to start. I
tried to force a regular rollback, but the Namenode refused again
saying that there was no available FS Image to roll back to. I checked
in the Hadoop code and indeed the Namenode saves the fs image with
different naming/path in case of a rolling upgrade or a regular
upgrade. Both cases make sense, especially the first one since there
was indeed a hole between the last transaction of the
FS-Image-rollback and the first available transaction to reply on the
QJM cluster. I chose the rolling upgrade initially since it was
appealing: it promises to bring back the Namenodes to their previous
versions, but keeping the data modified between upgrade and rollback.

I then found [3], in which it is said that with QJM everything is more
complicated, and a regular rollback is the only option available. What
I think this mean is that due to the Edit log spread among multiple
nodes, a rollback that keeps data between upgrade and rollback is not
available, so worst case scenario the data modified during that
timeframe is lost. Not a big deal in my case, but I want to triple
check with you if this is the correct interpretation or if there is
another tutorial/guide/etc.. that I haven't read with a different
procedure :)

Is my interpretation correct? If not, is there anybody with experience
in HDFS upgrades that could shed some light on the subject?

Thanks in advance!

Luca



[1] 
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
[2] 
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
[3] 
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Reply via email to