Hi everybody,

I am part of the Analytics team of the Wikimedia Foundation. We are
currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch
hosts), and for various reasons we'd love to explore the possibility
of moving to BigTop :)

The long term plan that we have in mind is something like the following:

- Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x release)
- Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to 3.x)
- Upgrade the OS to Debian 10 Buster

All the BigTop packages seem to be enough for our use cases (we
already have our own puppet automation), the only thing left would be
Hue but it is easy to package it (or re-use the CDH version as interim
solution). I have a couple of questions for you:

1) Has anybody attempted something similar in the past? If so, there
is some documentation and/or advice about how to do the migration?
>From what I gathered CDH is based upon BigTop so the only difference
would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily
patched so not sure what version it could be compared to). Hive also
changes between the distro (1.1 vs 2.x), but we are looking forward to
upgrade!

2) Is there any documentation about how to move from Hadoop 2 to
Hadoop 3 using BigTop? As far as I know the procedure is very delicate
and needs to be done with precise steps (I am mostly concerned of HDFS
consistency).

3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but
we were planning to keep using openjdk-8 for the near/medium-term.
>From 
>https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41
it seems that BigTop is aligned with this goal, but better to double
check.

Thanks in advance!

Luca

Reply via email to