Hi everybody, I am part of the Analytics team of the Wikimedia Foundation. We are currently managing a CDH 5.16.1 Hadoop cluster (on Debian 9 Stretch hosts), and for various reasons we'd love to explore the possibility of moving to BigTop :)
The long term plan that we have in mind is something like the following: - Move from CDH 5.16.1 to BigTop 1.4 (that IIUC is the last Hadoop 2.x release) - Upgrade to BigTop 1.5 (very delicate since IIUC it upgrades Hadoop to 3.x) - Upgrade the OS to Debian 10 Buster All the BigTop packages seem to be enough for our use cases (we already have our own puppet automation), the only thing left would be Hue but it is easy to package it (or re-use the CDH version as interim solution). I have a couple of questions for you: 1) Has anybody attempted something similar in the past? If so, there is some documentation and/or advice about how to do the migration? >From what I gathered CDH is based upon BigTop so the only difference would be the Hadoop version (2.6 vs 2.8.5, but CDH's one is heavily patched so not sure what version it could be compared to). Hive also changes between the distro (1.1 vs 2.x), but we are looking forward to upgrade! 2) Is there any documentation about how to move from Hadoop 2 to Hadoop 3 using BigTop? As far as I know the procedure is very delicate and needs to be done with precise steps (I am mostly concerned of HDFS consistency). 3) As far as I know Debian 10 (Buster) ships only with openjdk-11, but we were planning to keep using openjdk-8 for the near/medium-term. >From >https://github.com/apache/bigtop/blob/master/bigtop_toolchain/manifests/jdk.pp#L25-L41 it seems that BigTop is aligned with this goal, but better to double check. Thanks in advance! Luca
