Hi! I'd suggest to start from https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
The guide is not up to date, but I tested the 1.4 branch docker compose setup and it works nicely: git checkout branch-1.4 ./gradlew -Pconfig=`pwd`/provisioner/docker/config_debian-9.yaml -Pnum_instances=1 docker-provisioner docker container ls docker exec -it $id-of-the-container bash Keep in mind two things: 1) Hadoop is composed, essentially, by two parts: Yarn and HDFS. The former manages resources and jobs (essentially, vcores and ram available across multiple nodes, together with processes/containers running on them) and HDFS is the distributed file system. There are multiple daemons that compose the two, but essentially they follow a master/worker architecture. You can find endless literature about the subject, I'd suggest reading a little bit before starting any experiment as a good way to figure out how things are working. For example, a great starter is the usual distributed map-reduce grep among text files (you can find a lot of examples on the web). 2) The format of the data that Hadoop handles is a really important factor to consider, it is a very different mentality shift from relational databases. Check for example Sequence Files (https://cwiki.apache.org/confluence/display/HADOOP2/SequenceFile), http://avro.apache.org/ or https://parquet.apache.org/ for some examples. 3) There are a big number of tools to map a SQL-like language to the map-reduce world, like Apache Hive or Apache Spark, those are surely interesting to checkout. These are not included in the docker provisioner above afaics, but shouldn't be too difficult to install if needed. 4) Security is also a big part of the Hadoop world, with its own complexity (like Kerberos etc..). Keep it in mind while evaluating :) Hope this is good to start your tests! Luca ps: I am a member of the community, not a dev of the project, more authoritative answers may follow after mine, consider this as my2c. On Fri, Oct 30, 2020 at 4:33 PM Joba1 <[email protected]> wrote: > > Hi list, > > Long story short: I tried to follow the bigtop quickstart guide from the wiki > but failed. Can you help me please? > > > I would like to understand the differences between relational databases that > I know well and hadoop where I know next to nothing about. To achieve that, I > want to play around with the tools and products the hadoop ecosystem has to > offer. > > My goal is to understand how things work together from feeding some > unstructured data to querying it with some end user tool and everything in > between. It is not about volume or high availability on multiple servers or > just establishing a connection to an existing hadoop installation. > For now, as the first step, I just want to make the pipeline work in general. > > So far beginner friendly documentation that provides a full overview is > sparse, or I haven't looked in the right direction yet. But now I found this > project, which - judging from the 1.4.0 release notes - at first seemed ideal > for my purpose: > >> Deploying Bigtop is easy: grab the repo/list file for your favorite Linux >> distribution: >> https://www.apache.org/dyn/closer.lua/bigtop/bigtop-1.4.0/repos/ >> and you'll be running your very own big data cluster in no time! > > > But still I got stuck. It is not very clear to me what is actually needed to > get hadoop up and running with bigtop. There seem to be big differences > depending on versions. What are the rpm's good for? They seem to be just a > tomcat server? Why is that needed? > > My understanding of bigtop from the Quickstart Guide is that I need to unpack > the bigtop project tar and run gradlew on a system that runs docker. This > should get all tools and sources and build all software, connect it together, > and do some integration tests (maybe with the help of the tomcat server). > Right? > > I prepared a script and the logs it produces, so you see exactly what I'm > trying to do. Probably you can spot very easily what the problem is? > > Best Regards, > Joachim > > > > P.S.: I tried these -POS values with and without the bigtop repo rpms > installed. With more or less the same result: > > bigtop@a956005e4481:~> docker image ls > REPOSITORY TAG IMAGE ID CREATED > SIZE > bigtop/slaves trunk-centos-8 ba284fdb2b47 8 hours ago > 3.51GB > bigtop/slaves trunk-ubuntu-16.04 43c6ff57c166 8 hours ago > 3.76GB > bigtop/slaves trunk-centos-7 8809f6c53ef8 8 hours ago > 2.72GB > bigtop/slaves trunk-opensuse-42.3 e7246a926a13 7 months ago > 2.77GB > > > > > P.P.S.: Yesterday I had one testrun that went quite a bit farther, but failed > with a java exception during building an rpm in trunk-opensuse-42.3. But I > cannot reproduce that anymore. It fails much earlier now. > > > > My test runs on a bare metal opensuse 15.1 with its docker package > (19.03.11), but the test itself runs in an opensuse 42.3 docker image to make > it easy to reproduce my environment: > > docker run --name bigtop-prep -v /var/run/docker.sock:/var/run/docker.sock -v > /usr/bin/docker:/usr/bin/docker -ti opensuse/leap:42.3 /bin/bash > zypper in -y vim less > zypper in -y tar curl unzip rpm-build java-1_8_0-openjdk-devel > sed -i 's/en_US.UTF-8//' /etc/sysconfig/language > groupadd -g `ls -l /var/run/docker.sock | while read x x x g x; do echo $g; > done` docker > useradd -m -G docker bigtop > zypper ar -G > https://artfiles.org/apache.org/bigtop/bigtop-1.4.0/repos/opensuse42.3/bigtop.repo > zypper in -y which bigtop-utils bigtop-jsvc bigtop-tomcat bigtop-groovy > cp -av /usr/lib/bigtop-tomcat/conf.template /usr/lib/bigtop-tomcat/conf > sed -i '/<\/tomcat-users>/i \ \ <role rolename="manager"/>' > /usr/lib/bigtop-tomcat/conf/tomcat-users.xml > sed -i '/<\/tomcat-users>/i \ \ <role rolename="manager-gui"/>' > /usr/lib/bigtop-tomcat/conf/tomcat-users.xml > sed -i '/<\/tomcat-users>/i \ \ <user username="tomcat" password="my-PW" > roles="tomcat,manager,manager-gui"/>' > /usr/lib/bigtop-tomcat/conf/tomcat-users.xml > sed -i '/<\/tomcat-users>/i \ \ <user username="joachim" password="my-PW" > roles="tomcat,manager,manager-gui"/>' > /usr/lib/bigtop-tomcat/conf/tomcat-users.xml > sed -i '/<\/tomcat-users>/i \ \ <user username="julian" password="his-PW" > roles="tomcat,manager,manager-gui"/>' > /usr/lib/bigtop-tomcat/conf/tomcat-users.xml > /usr/lib/bigtop-tomcat/bin/startup.sh > su - bigtop > curl > https://apache.lauf-forum.at/bigtop/bigtop-1.4.0/bigtop-1.4.0-project.tar.gz > -o bigtop-1.4.0-project.tar.gz > tar xf bigtop-1.4.0-project.tar.gz > ln -sf bigtop-1.4.0 bigtop > cd bigtop > ./gradlew --console=plain hadoop-pkg-ind -POS=opensuse-42.3 > > > > > This is the complete output of the gradle run (other steps seem fine, tomcat > catalina is running: ports are open but makes no difference): > > bigtop@a956005e4481:~/bigtop> ./gradlew --console=plain hadoop-pkg-ind > -POS=opensuse-42.3 > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 > 100 74.7M 100 74.7M 0 0 5839k 0 0:00:13 0:00:13 --:--:-- 5990k > ~/.gradle/wrapper/dists/gradle-4.10.3-bin/016e637c7ef47db2d3e632a3936ca351 > ~/bigtop > ~/bigtop > Welcome to Gradle 4.10.3! > Here are the highlights of this release: > - Incremental Java compilation by default > - Periodic Gradle caches cleanup > - Gradle Kotlin DSL 1.0-RC6 > - Nested included builds > - SNAPSHOT plugin versions in the `plugins {}` block > For more details see https://docs.gradle.org/4.10.3/release-notes.html > Starting a Gradle Daemon (subsequent builds will be faster) > > Task :hadoop-pkg-ind FAILED > Building trunk hadoop-pkg on opensuse-42.3 in Docker... > +++ dirname ./bigtop-ci/build.sh > ++ cd ./bigtop-ci/.. > ++ pwd > + BIGTOP_HOME=/home/bigtop/bigtop > + '[' 6 -eq 0 ']' > + [[ 6 -gt 0 ]] > + key=--prefix > + case $key in > + PREFIX=trunk > + shift > + shift > + [[ 4 -gt 0 ]] > + key=--os > + case $key in > + OS=opensuse-42.3 > + shift > + shift > + [[ 2 -gt 0 ]] > + key=--target > + case $key in > + TARGET=hadoop-pkg > + shift > + shift > + [[ 0 -gt 0 ]] > + '[' -z x ']' > + '[' -z x ']' > + '[' '' == true ']' > + IMAGE_NAME=bigtop/slaves:trunk-opensuse-42.3 > ++ uname -m > + ARCH=x86_64 > + '[' x86_64 '!=' x86_64 ']' > ++ docker run -d bigtop/slaves:trunk-opensuse-42.3 /sbin/init > + > CONTAINER_ID=b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be > + trap 'docker rm -f > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be' EXIT + > docker cp /home/bigtop/bigtop > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop > + docker cp /home/bigtop/bigtop/bigtop-ci/entrypoint.sh > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/entrypoint.sh > Error: No such container:path: > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop > + docker exec > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be bash -c > 'chown -R jenkins:jenkins /bigtop' > + docker exec --user jenkins > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be bash -c 'cd > /bigtop && ./entrypoint.sh hadoop-pkg --info' > bash: line 0: cd: /bigtop: No such file or directory > + RESULT=1 > + mkdir -p output > + docker cp > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/build > . > Error: No such container:path: > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/build > + docker cp > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/output > . > Error: No such container:path: > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/output > + docker rm -f > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be > + '[' 1 -ne 0 ']' > + exit 1 > + docker rm -f > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be > Error: No such container: > b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be > FAILURE: Build failed with an exception. > * Where: > Script '/home/bigtop/bigtop-1.4.0/packages.gradle' line: 657 > * What went wrong: > Execution failed for task ':hadoop-pkg-ind'. > > Process 'command 'bash'' finished with non-zero exit value 1 > * Try: > Run with --stacktrace option to get the stack trace. Run with --info or > --debug option to get more log output. Run with --scan to get full insights. > * Get more help at https://help.gradle.org > BUILD FAILED in 30s > 1 actionable task: 1 executed
