Hello friends,

I have packed up hadoop a number of ways over the years.

Lately, since eveyone loves docker, I find my 80gb hard disk constantly
filled by, bulky or bloated images.

I have to force these bloated images to "hit the gym".

https://hub.docker.com/u/ecapriolo
I have a spark, Zeppelin, and livy running on alpine and not much more than
the jre.

I wanted to tackle hadoop core next.

https://issues.apache.org/jira/browse/HADOOP-19756

Few funny fake blockers.
1) musl and thr code in ticket above
2) the old 2.5.0 protobuf
So many oss problems no one even bothers packaging that protoc version for
6 years
3) the rhel reliance on the nis libraries

Next, I realize rhe hadoop "lean" package cant accommodate every case. But
the lea  is like 500mb docs and 500mb jars :)
Timeline server and libs form 150mb. Test jars maybe 100 more. The native
libs outside libhadoop are 180mb. (If you are on alpine they are negligible
anyway)

See the rm -rfs here.

https://github.com/edwardcapriolo/edgy-ansible/tree/main/imaging/hadoop

Anyway my goal is to have nice lean alpine based packages and more advanced
helm charts mirroring things I have done in ansible.. 2 nn 3 journal nodes
setup.  2 rms 3 zk etc.

Reply via email to