Hello friends, I have packed up hadoop a number of ways over the years.
Lately, since eveyone loves docker, I find my 80gb hard disk constantly filled by, bulky or bloated images. I have to force these bloated images to "hit the gym". https://hub.docker.com/u/ecapriolo I have a spark, Zeppelin, and livy running on alpine and not much more than the jre. I wanted to tackle hadoop core next. https://issues.apache.org/jira/browse/HADOOP-19756 Few funny fake blockers. 1) musl and thr code in ticket above 2) the old 2.5.0 protobuf So many oss problems no one even bothers packaging that protoc version for 6 years 3) the rhel reliance on the nis libraries Next, I realize rhe hadoop "lean" package cant accommodate every case. But the lea is like 500mb docs and 500mb jars :) Timeline server and libs form 150mb. Test jars maybe 100 more. The native libs outside libhadoop are 180mb. (If you are on alpine they are negligible anyway) See the rm -rfs here. https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop Anyway my goal is to have nice lean alpine based packages and more advanced helm charts mirroring things I have done in ansible.. 2 nn 3 journal nodes setup. 2 rms 3 zk etc.
