Re: Testing rollback after HDFS upgrade

Luca Toscano Sun, 20 Sep 2020 01:55:59 -0700

Hi Evans,

I am late in answering as well :)


I thought about it and I think that with the right premises (example:
this is tailored for Wikimedia's environment, it assumes that a
cluster downtime is acceptable, etc..) the storytelling style might be
more easy to digest than a list of steps to follow. I think that in
all use cases different from Wikimedia there will be adjustments to
make, and things that work/don't-work/etc.. One thing that it might be
good to add at the end is a "summary of known pitfalls/bugs/etc.."
found during the procedure, that in my case were the most
time-consuming ones. I'll add it during the next few days and people
can comment :)

The Blog post would be a good idea, maybe something that we can share
between Wikimedia and Apache? I am planning to move to BigTop during
the upcoming quarter (October -> December), that will also show if my
procedure works on a cluster of 60+ nodes (rather than on a small one
of 8 nodes) :D. As soon as I have done it I'll follow up with this
list so organize a blog post, does it sound ok?

Thanks a lot for all the support!

Luca

On Tue, Sep 15, 2020 at 6:06 PM Evans Ye <evan...@apache.org> wrote:
>
> Hey Luca,
>
> Sorry for the late reply. I was busy for a conference. It's just over now.
> Anyway, I  think the writing is pretty informative. But it's more like a 
> storytelling style. Also several contents are WikiMedia specific things. 
> That's why I think it's more suitable for a blogpost.
>
> Anyhow, I think either way it's great content. If we keep it as is, I think 
> we can make it available on Bigtop's WIKI & Blog, or even Success at Apache 
> with the title like "WikiMedia's story to migrate from CDH to Bigtop". If you 
> want to make it more like an official guide, the title will be "CDH to Bigtop 
> Migration Guide". We can state the limitation  and environment so that people 
> can take it w/ a caution that it might not suit their own environment.
>
> Which way to go depends on how much effort you'd like to take. Let me know 
> what you think so that we can move forward.
>
> - Evans
>
> Luca Toscano <toscano.l...@gmail.com> 於 2020年9月7日 週一 下午3:39寫道：
>>
>> Hi Evans,
>>
>> thanks for the review! What are the things that you'd like to see to
>> make them more consumable for users? I can re-shape the writing, I
>> tried to come up with something to kick off a conversation with the
>> community, it would be interesting to know if anybody else has a
>> similar use case and how/if they are working on a solution.
>>
>> For the blogpost, maybe we can coordinate something shared between
>> Apache and Wikimedia when the migration is done, I am sure it would be
>> a nice example of the two Foundations collaborating :)
>>
>> Luca
>>
>> On Wed, Sep 2, 2020 at 8:21 PM Evans Ye <evan...@apache.org> wrote:
>> >
>> > Hi Luca,
>> >
>> > I read through the doc briefly. I think the doc works very well as a 
>> > blogpost of a successful story for Wikimedia migrating from CDH to Bigtop. 
>> > However, the current writing doesn't seem to be easily consumable for 
>> > users' who are just looking into the solutions/steps for doing similar 
>> > migrations. May I know what title you would prefer if we put the doc in 
>> > Bigtop's wiki?
>> >
>> > What I was thinking is the cookbook for migration. But we can discuss 
>> > this. IMHO a Success at Apache[1] blogpost is also possible. But I need to 
>> > figure out who to talk to. Let me know what you think.
>> >
>> > [1] https://blogs.apache.org/foundation/category/SuccessAtApache
>> >
>> > Evans
>> >
>> > Evans Ye <evan...@apache.org> 於 2020年8月30日 週日 上午3:18寫道：
>> >>
>> >> Hi Luca,
>> >>
>> >> I'm on vacation hence do not have time for review right now. I'll get 
>> >> back to you next week.
>> >>
>> >> The doc is definitely valuable. Once you have your production migrated 
>> >> successfully. We can prove to the other users that this is a battle 
>> >> proven solution. Even more, we can give a talk at ApacheCon or somewhere 
>> >> else to further amplify the impact of the work. This is definitely an 
>> >> open source winning case so I think it deserve a talk.
>> >>
>> >> Evans
>> >>
>> >>
>> >> Luca Toscano <toscano.l...@gmail.com> 於 2020年8月27日 週四 下午9:11寫道：
>> >>>
>> >>> Hi Evans,
>> >>>
>> >>> it took a while I know but I have the first version of the gdoc for the 
>> >>> upgrade:
>> >>>
>> >>> https://docs.google.com/document/d/1fI1mvbR1mFLV6ohU5cIEnU5hFvEE7EWnKYWOkF55jtE/edit?usp=sharing
>> >>>
>> >>> I tried to list all the steps involved in migrating from CDH 5 to
>> >>> Bigtop 1.4, anybody interested should be able to comment. The idea
>> >>> that I have is to discuss this for a few days and then possibly make
>> >>> it permanent somewhere in the Bigtop wiki? (of course if the document
>> >>> will be considered useful for others etc..)
>> >>>
>> >>> During these days I tested the procedure multiple times, and I have
>> >>> also tested the HDFS finalize step, everything works as expected. I
>> >>> hope to be able to move to Bigtop during the next couple of months.
>> >>>
>> >>> Luca
>> >>>
>> >>> On Tue, Jul 21, 2020 at 4:04 PM Evans Ye <evan...@apache.org> wrote:
>> >>> >
>> >>> > Yes. I think a shared gdoc is prefered, and you can open up a JIRA 
>> >>> > ticket to track it.
>> >>> >
>> >>> > Luca Toscano <toscano.l...@gmail.com> 於 2020年7月20日 週一 21:10 寫道：
>> >>> >>
>> >>> >> Hi Evans!
>> >>> >>
>> >>> >> What is the best medium to use for the documentation/comments ? A
>> >>> >> shared gdoc or something similar?
>> >>> >>
>> >>> >> Luca
>> >>> >>
>> >>> >> On Thu, Jul 16, 2020 at 5:11 PM Evans Ye <evan...@apache.org> wrote:
>> >>> >> >
>> >>> >> > One thing I think would be great to have is a doc version of the 
>> >>> >> > steps for upgrade and rollback. The benefits:
>> >>> >> > 1. Anything unexpected happened during automation, you do have 
>> >>> >> > folks can quickly understand what's going on and get into the 
>> >>> >> > investigation.
>> >>> >> > 2. Share the doc with us to help the others OSS users for doing the 
>> >>> >> > migration. For the env specific things I think that's fine. We can 
>> >>> >> > left comment on it. At least all the other users can get a high 
>> >>> >> > level view of a proven solution. And then they can go and find out 
>> >>> >> > the rest of the pieces by themselves.
>> >>> >> >
>> >>> >> > For automations, I suggest to split up the automation into several 
>> >>> >> > stages, and apply some validation steps(manually is ok) before 
>> >>> >> > kicking of the next stage.
>> >>> >> >
>> >>> >> > Best,
>> >>> >> > Evans
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Luca Toscano <toscano.l...@gmail.com> 於 2020年7月15日 週三 下午9:07寫道：
>> >>> >> >>
>> >>> >> >> Hi everybody,
>> >>> >> >>
>> >>> >> >> I didn't get the time to work on this until recently, but I finally
>> >>> >> >> managed to have a reliable procedure to upgrade from CDH to Bigtop 
>> >>> >> >> 1.4
>> >>> >> >> and rollback if needed. The assumptions are:
>> >>> >> >>
>> >>> >> >> 1) It is ok to have (limited) cluster downtime.
>> >>> >> >> 2) Rolling upgrade is not needed.
>> >>> >> >> 3) QJM is used.
>> >>> >> >>
>> >>> >> >> The procedure is listed in these two scripts:
>> >>> >> >>
>> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/stop-cluster.py
>> >>> >> >> https://github.com/wikimedia/operations-cookbooks/blob/master/cookbooks/sre/hadoop/change-distro-from-cdh.py
>> >>> >> >>
>> >>> >> >> The code is highly dependent on my working environment, but it 
>> >>> >> >> should
>> >>> >> >> be clear to follow when writing a tutorial about how to migrate 
>> >>> >> >> from
>> >>> >> >> CDH to Bigtop. All the suggestions given by this mailing list were
>> >>> >> >> really useful to reach a solution!
>> >>> >> >>
>> >>> >> >> My next steps will be:
>> >>> >> >>
>> >>> >> >> 1) Keep testing Bigtop 1.4 (finalize HDFS upgrade, run more hadoop
>> >>> >> >> jobs, test Hive 2, etc..).
>> >>> >> >> 2) Upgrade the production Hadoop cluster to Bigtop 1.4 on Debian 9
>> >>> >> >> (HDFS 2.6.0-cdh -> 2.8.5).
>> >>> >> >> 3) Upgrade to Bigtop 1.5 on Debian 9 (HDFS 2.8.5 -> 2.10).
>> >>> >> >> 4) Upgrade to Debian 10.
>> >>> >> >>
>> >>> >> >> With automation it shouldn't be very difficult, I'll report 
>> >>> >> >> progress once made.
>> >>> >> >>
>> >>> >> >> Thanks a lot!
>> >>> >> >>
>> >>> >> >> Luca
>> >>> >> >>
>> >>> >> >> On Mon, Apr 13, 2020 at 9:25 AM Luca Toscano 
>> >>> >> >> <toscano.l...@gmail.com> wrote:
>> >>> >> >> >
>> >>> >> >> > Hi Evans,
>> >>> >> >> >
>> >>> >> >> > thanks a lot for the feedback, it was exactly what I needed. The
>> >>> >> >> > simpler the better is definitely a good advice in this use case, 
>> >>> >> >> > I'll
>> >>> >> >> > try this week another rollout/rollback and report back :)
>> >>> >> >> >
>> >>> >> >> > Luca
>> >>> >> >> >
>> >>> >> >> > On Thu, Apr 9, 2020 at 8:09 PM Evans Ye <evan...@apache.org> 
>> >>> >> >> > wrote:
>> >>> >> >> > >
>> >>> >> >> > > Hi Luca,
>> >>> >> >> > >
>> >>> >> >> > > Thanks for reporting back and let us know how it goes.
>> >>> >> >> > > I don't have the exactly HDFS with QJM HA upgrade experience. 
>> >>> >> >> > > The experience I had was 0.20 non-HA upgrade to 2.0 non-HA and 
>> >>> >> >> > > then enable QJM HA, which was back in 2014.
>> >>> >> >> > >
>> >>> >> >> > > Regarding to rollback, I think you're right:
>> >>> >> >> > >
>> >>> >> >> > > it is possible to rollback to HDFS’ state before the upgrade 
>> >>> >> >> > > in case of unexpected problems.
>> >>> >> >> > >
>> >>> >> >> > > My previous experience is the same that the rollback is merely 
>> >>> >> >> > > a snapshot before the upgrade. If you've gone far, then 
>> >>> >> >> > > rollback cost more data lost... Our runbook is if our sanity 
>> >>> >> >> > > check failed during upgrade downtime, we perform the rollback 
>> >>> >> >> > > immediately.
>> >>> >> >> > >
>> >>> >> >> > > Regarding to that FSImage hole issue, I've experienced it as 
>> >>> >> >> > > well.
>> >>> >> >> > > I managed to fix it by manually edit the FSImage with offline 
>> >>> >> >> > > image viewer[1] and delete that missing editLog in FSImage. 
>> >>> >> >> > > That actually brought my cluster back with a little number of 
>> >>> >> >> > > missing blocks.
>> >>> >> >> > >
>> >>> >> >> > > Our experience says that the more the steps, the more the 
>> >>> >> >> > > chance you failed the upgrade. We did good on dozen times of 
>> >>> >> >> > > testing, DEV cluster, STAGING cluster, but still got missing 
>> >>> >> >> > > blocks when upgrading Production...
>> >>> >> >> > >
>> >>> >> >> > > The suggestion is to get your production in good shape 
>> >>> >> >> > > first(the less decommissioned, offline DNs, disk failures, the 
>> >>> >> >> > > better).
>> >>> >> >> > > Also, maybe you can switch to non-HA mode and do the upgrade 
>> >>> >> >> > > to simplify the things?
>> >>> >> >> > >
>> >>> >> >> > > Not many helps but please let us know if any progress.
>> >>> >> >> > > Last one, have you reached out to Hadoop community? the 
>> >>> >> >> > > authors should know the most :)
>> >>> >> >> > >
>> >>> >> >> > > - Evans
>> >>> >> >> > >
>> >>> >> >> > > [1] 
>> >>> >> >> > > https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
>> >>> >> >> > >
>> >>> >> >> > > Luca Toscano <toscano.l...@gmail.com> 於 2020年4月8日 週三 21:03 寫道：
>> >>> >> >> > >>
>> >>> >> >> > >> Hi everybody,
>> >>> >> >> > >>
>> >>> >> >> > >> most of the bugs/issues/etc.. that I found while upgrading 
>> >>> >> >> > >> from CDH 5
>> >>> >> >> > >> to BigTop 1.4 are fixed, I am now testing (as suggested also 
>> >>> >> >> > >> in here)
>> >>> >> >> > >> upgrade/rollback procedures for HDFS (all written in
>> >>> >> >> > >> https://phabricator.wikimedia.org/T244499, will add 
>> >>> >> >> > >> documentation
>> >>> >> >> > >> about this at the end I promise).
>> >>> >> >> > >>
>> >>> >> >> > >> I initially followed [1][2] in my Test cluster, choosing the 
>> >>> >> >> > >> Rolling
>> >>> >> >> > >> upgrade, but when I tried to rollback (after days since the 
>> >>> >> >> > >> initial
>> >>> >> >> > >> upgrade) I ended up in an inconsistent state and I wasn't 
>> >>> >> >> > >> able to
>> >>> >> >> > >> recover the previous HDFS state. I didn't save the exact error
>> >>> >> >> > >> messages but the situation was more or less the following:
>> >>> >> >> > >>
>> >>> >> >> > >> FS-Image-rollback (created at the time of the upgrade) - up 
>> >>> >> >> > >> to transaction X
>> >>> >> >> > >> FS-Image-current - up to transaction Y, with Y = X + 10000 
>> >>> >> >> > >> (number
>> >>> >> >> > >> totally made up for the example)
>> >>> >> >> > >> QJM cluster: first available transaction Z = X + 10000 + 1
>> >>> >> >> > >>
>> >>> >> >> > >> When I tried to rolling rollback, the Namenode complained 
>> >>> >> >> > >> about a hole
>> >>> >> >> > >> in the transaction log, namely at X + 1, so it refused to 
>> >>> >> >> > >> start. I
>> >>> >> >> > >> tried to force a regular rollback, but the Namenode refused 
>> >>> >> >> > >> again
>> >>> >> >> > >> saying that there was no available FS Image to roll back to. 
>> >>> >> >> > >> I checked
>> >>> >> >> > >> in the Hadoop code and indeed the Namenode saves the fs image 
>> >>> >> >> > >> with
>> >>> >> >> > >> different naming/path in case of a rolling upgrade or a 
>> >>> >> >> > >> regular
>> >>> >> >> > >> upgrade. Both cases make sense, especially the first one 
>> >>> >> >> > >> since there
>> >>> >> >> > >> was indeed a hole between the last transaction of the
>> >>> >> >> > >> FS-Image-rollback and the first available transaction to 
>> >>> >> >> > >> reply on the
>> >>> >> >> > >> QJM cluster. I chose the rolling upgrade initially since it 
>> >>> >> >> > >> was
>> >>> >> >> > >> appealing: it promises to bring back the Namenodes to their 
>> >>> >> >> > >> previous
>> >>> >> >> > >> versions, but keeping the data modified between upgrade and 
>> >>> >> >> > >> rollback.
>> >>> >> >> > >>
>> >>> >> >> > >> I then found [3], in which it is said that with QJM 
>> >>> >> >> > >> everything is more
>> >>> >> >> > >> complicated, and a regular rollback is the only option 
>> >>> >> >> > >> available. What
>> >>> >> >> > >> I think this mean is that due to the Edit log spread among 
>> >>> >> >> > >> multiple
>> >>> >> >> > >> nodes, a rollback that keeps data between upgrade and 
>> >>> >> >> > >> rollback is not
>> >>> >> >> > >> available, so worst case scenario the data modified during 
>> >>> >> >> > >> that
>> >>> >> >> > >> timeframe is lost. Not a big deal in my case, but I want to 
>> >>> >> >> > >> triple
>> >>> >> >> > >> check with you if this is the correct interpretation or if 
>> >>> >> >> > >> there is
>> >>> >> >> > >> another tutorial/guide/etc.. that I haven't read with a 
>> >>> >> >> > >> different
>> >>> >> >> > >> procedure :)
>> >>> >> >> > >>
>> >>> >> >> > >> Is my interpretation correct? If not, is there anybody with 
>> >>> >> >> > >> experience
>> >>> >> >> > >> in HDFS upgrades that could shed some light on the subject?
>> >>> >> >> > >>
>> >>> >> >> > >> Thanks in advance!
>> >>> >> >> > >>
>> >>> >> >> > >> Luca
>> >>> >> >> > >>
>> >>> >> >> > >>
>> >>> >> >> > >>
>> >>> >> >> > >> [1] 
>> >>> >> >> > >> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback
>> >>> >> >> > >> [2] 
>> >>> >> >> > >> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html
>> >>> >> >> > >> [3] 
>> >>> >> >> > >> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#HDFS_UpgradeFinalizationRollback_with_HA_Enabled

Re: Testing rollback after HDFS upgrade

Reply via email to