Il giorno mer 15 gen 2020 alle ore 17:55 Evans Ye <[email protected]>
ha scritto:
>
> Let me answer some parts w/ my best effort and let the others to add the 
> things I'm missing.

Thanks a lot :)

> Luca Toscano <[email protected]> 於 2020年1月15日 週三 下午9:25寫道:
>>
>> Hi everybody,
>>
>> I am part of the Analytics team of the Wikimedia Foundation. We are
>> currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch
>> hosts), and for various reasons we'd love to explore the possibility
>> of moving to BigTop :)
>>
>> The long term plan that we have in mind is something like the following:
>>
>> - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x 
>> release)
>> - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to 3.x)
>
>
> 1.5 is BOM is not finalized yet.  It can be Hadoop 2.X or Hadoop 3, depending 
> on whether Hadoop 3 packaging is solved. You can refer to the discussion 
> thread [1].
> Basically what I can recall is because of Hadoop3's shell script rewrite, the 
> packaging is really a challenge. Some of our folks have done POC at [2] but 
> the problem is not fully solved.

Makes sense yes, but the long term plan is to eventually ship Hadoop 3
right? Seems an obvious question I know but better double checking.

>> 1) Has anybody attempted something similar in the past? If so, there
>> is some documentation and/or advice about how to do the migration?
>> From what I gathered CDH is based upon BigTop so the only difference
>> would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily
>> patched so not sure what version it could be compared to). Hive also
>> changes between the distro (1.1 vs 2.x), but we are looking forward to
>> upgrade!
>
> 5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we 
> decided to move to Bigtop 0.8.
> You can refer to [4][5]. Basically the idea is:
> 1. do parallel migration.
> 2. categorize data into hot-cold data.
> 3. [cold data] establish kerberos federation and move the cold data via cross 
> cluster distcp under the hood.
> 4. [hot data] distcp with user's cooperation.

Really nice links. We do use Kerberos (single realm) as well so it was
an interesting reading. We don't have super strict availability
concerns for Hadoop, usually we take our cluster offline for one/two
hours if needed for important migrations (CDH upgrades, Java upgrades,
Kerberos etc.. just to name a few). Would it be possible to upgrade in
place in your opinion, swapping CDH packages with BigTop ones? We have
a testing/staging cluster to use as playground, and today I tried to
replace CDH 5.16.2 packages with BigTop's 1.4 ones on one Hadoop test
worker (I know very brutal and not elegant, but it was a test :). The
HDFS datanode and journalnode daemons came up fine, but the Yarn Node
Manager did not due to protocol buffer mismatch issues (I think due to
https://issues.apache.org/jira/browse/YARN-8310). It was a good result
in my opinion, my next step would be to stop the whole (test) cluster
and also upgrade the other nodes, to see what works and what not. The
HDFS Namenode's consistency is my first thought of course, but it
should be like a 2.6 -> 2.8 upgrade in theory. What do you think?

Il giorno mer 15 gen 2020 alle ore 18:10 Jean-Marc Spaggiari
<[email protected]> ha scritto:
>
> Hi Lucas,
>
> Might be nice if you document your steps and make them available to the 
> community. I think it might interest many other users.

I will take care of it yes! One think that I am looking forward is to
be part of this community and contribute back :)

Luca

Reply via email to