hi,

Thanks for your request, I was anticipating requests from wikimedia for years!

let me add some bits since I did the initial Debian port, hue maintenance, and 
hadoop 3 poc...  my answer might be too pessimistic, but there is a whole lot 
of things to be done in order to get it in alignment with upstream packages 
again.

> Am 15.01.2020 um 17:55 schrieb Evans Ye <[email protected]>:
> 
> 
> Let me answer some parts w/ my best effort and let the others to add the 
> things I'm missing.
> 
> Luca Toscano <[email protected]> 於 2020年1月15日 週三 下午9:25寫道:
>> Hi everybody,
>> 
>> I am part of the Analytics team of the Wikimedia Foundation. We are
>> currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch
>> hosts), and for various reasons we'd love to explore the possibility
>> of moving to BigTop :)
>> 
>> The long term plan that we have in mind is something like the following:
>> 
>> - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x 
>> release)
>> - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to 3.x)
> 
> 1.5 is BOM is not finalized yet.  It can be Hadoop 2.X or Hadoop 3, depending 
> on whether Hadoop 3 packaging is solved. You can refer to the discussion 
> thread [1].
> Basically what I can recall is because of Hadoop3's shell script rewrite, the 
> packaging is really a challenge. Some of our folks have done POC at [2] but 
> the problem is not fully solved.

[2] not solved at all, unfortunately 
> 
>> - Upgrade the OS to Debian 10 Buster

debian 10 may not be supported by bigtop. problematic are upstream packages 
colliding with bigtop like (IIRC) zookeeper. I am not sure if patches for these 
topics made it upstream.

>> 
>> All the BigTop packages seem to be enough for our use cases (we
>> already have our own puppet automation), the only thing left would be
>> Hue but it is easy to package it (or re-use the CDH version as interim
>> solution). I have a couple of questions for you:
> 
> Hue was included before 1.2.1 (inclusive), however dropped since 1.3 release. 
> It was done in [3].
> I can't recall the reason of dropping hue though...

hue was dropped, because dependency management was a nightmare. hue decided to 
fork several packages and had no clean diffs, so aligning and updating with 
distro packages was not possible any more (for me).  iirc it was dependent on 
oozie, where upstream support was stalled at that time and didn't worked with 
kerberos (at least not for me).
> 
>> 
>> 1) Has anybody attempted something similar in the past? If so, there
>> is some documentation and/or advice about how to do the migration?
>> From what I gathered CDH is based upon BigTop so the only difference
>> would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily
>> patched so not sure what version it could be compared to). Hive also
>> changes between the distro (1.1 vs 2.x), but we are looking forward to
>> upgrade!
> 
> 5 years ago my company was on CDH4(pkgs w/ our own puppet, no CM) and we 
> decided to move to Bigtop 0.8.
> You can refer to [4][5]. Basically the idea is:
> 1. do parallel migration.
> 2. categorize data into hot-cold data.
> 3. [cold data] establish kerberos federation and move the cold data via cross 
> cluster distcp under the hood.
> 4. [hot data] distcp with user's cooperation.
>  
>> 
>> 2) Is there any documentation about how to move from Hadoop 2 to
>> Hadoop 3 using BigTop? As far as I know the procedure is very delicate
>> and needs to be done with precise steps (I am mostly concerned of HDFS
>> consistency).
> 
> No. I suggestion is to refer to Hadoop or Cloudera's upgrade guide.
> I've done upgrade previously from hadoop 1.X to 2.X. Basically you just 
> follow the guide. If you have a staging cluster, try on that one first.
>  
>> 
>> 3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but
>> we were planning to keep using openjdk-8 for the near/medium-term.
>> From 
>> https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41
>> it seems that BigTop is aligned with this goal, but better to double
>> check.
> 
> AFAIK this is aligned, however things are subject to change so no guarantee 
> before the release has been made ;)

ah it changes made it upstream, good. but bigtop has no regular ci for debian 
10 running, there will be other problems around.
>  
>> 
>> Thanks in advance!
>> 
>> Luca  
> 
> Let me know if you have more questions :)
> 
> [1] 
> https://lists.apache.org/thread.html/2f80388a1f87bed20de2bb61882e734d76623896812cd0ae168b8ff5%40%3Cdev.bigtop.apache.org%3E
>  
> [2] https://github.com/apache/bigtop/tree/bigtop-alpha
> [3] https://issues.apache.org/jira/browse/BIGTOP-3021
> [4] 
> https://www.slideshare.net/takeshi_miao/zerodowntime-hadoophbase-crossdatacenter-migration?qid=be058c6e-a799-4a8c-bfa4-35f599074482&v=&b=&from_search=1
> [5] 
> https://www.slideshare.net/YafangChang/hadoopcon2015-multicluster-live-synchronization-with-kerberos-federated-hadoop?qid=e5c77fd9-5038-4fe1-b233-5b025767c763&v=&b=&from_search=1

Reply via email to