Hi!

I'd suggest to start from
https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide

The guide is not up to date, but I tested the 1.4 branch docker
compose setup and it works nicely:

git checkout branch-1.4
./gradlew -Pconfig=`pwd`/provisioner/docker/config_debian-9.yaml
-Pnum_instances=1 docker-provisioner
docker container ls
docker exec -it $id-of-the-container bash

Keep in mind two things:
1) Hadoop is composed, essentially, by two parts: Yarn and HDFS. The
former manages resources and jobs (essentially, vcores and ram
available across multiple nodes, together with processes/containers
running on them) and HDFS is the distributed file system. There are
multiple daemons that compose the two, but essentially they follow a
master/worker architecture. You can find endless literature about the
subject, I'd suggest reading a little bit before starting any
experiment as a good way to figure out how things are working. For
example, a great starter is the usual distributed map-reduce grep
among text files (you can find a lot of examples on the web).
2) The format of the data that Hadoop handles is a really important
factor to consider, it is a very different mentality shift from
relational databases. Check for example Sequence Files
(https://cwiki.apache.org/confluence/display/HADOOP2/SequenceFile),
http://avro.apache.org/ or https://parquet.apache.org/ for some
examples.
3) There are a big number of tools to map a SQL-like language to the
map-reduce world, like Apache Hive or Apache Spark, those are surely
interesting to checkout. These are not included in the docker
provisioner above afaics, but shouldn't be too difficult to install if
needed.
4) Security is also a big part of the Hadoop world, with its own
complexity (like Kerberos etc..). Keep it in mind while evaluating :)

Hope this is good to start your tests!

Luca

ps: I am a member of the community, not a dev of the project, more
authoritative answers may follow after mine, consider this as my2c.

On Fri, Oct 30, 2020 at 4:33 PM Joba1 <[email protected]> wrote:
>
> Hi list,
>
> Long story short: I tried to follow the bigtop quickstart guide from the wiki 
> but failed. Can you help me please?
>
>
> I would like to understand the differences between relational databases that 
> I know well and hadoop where I know next to nothing about. To achieve that, I 
> want to play around with the tools and products the hadoop ecosystem has to 
> offer.
>
> My goal is to understand how things work together from feeding some 
> unstructured data to querying it with some end user tool and everything in 
> between. It is not about volume or high availability on multiple servers or 
> just establishing a connection to an existing hadoop installation.
> For now, as the first step, I just want to make the pipeline work in general.
>
> So far beginner friendly documentation that provides a full overview is 
> sparse, or I haven't looked in the right direction yet. But now I found this 
> project, which - judging from the 1.4.0 release notes - at first seemed ideal 
> for my purpose:
>
>> Deploying Bigtop is easy: grab the repo/list file for your favorite Linux
>> distribution:
>>   https://www.apache.org/dyn/closer.lua/bigtop/bigtop-1.4.0/repos/
>> and you'll be running your very own big data cluster in no time!
>
>
> But still I got stuck. It is not very clear to me what is actually needed to 
> get hadoop up and running with bigtop. There seem to be big differences 
> depending on versions. What are the rpm's good for? They seem to be just a 
> tomcat server? Why is that needed?
>
> My understanding of bigtop from the Quickstart Guide is that I need to unpack 
> the bigtop project tar and run gradlew on a system that runs docker. This 
> should get all tools and sources and build all software, connect it together, 
> and do some integration tests (maybe with the help of the tomcat server). 
> Right?
>
> I prepared a script and the logs it produces, so you see exactly what I'm 
> trying to do. Probably you can spot very easily what the problem is?
>
> Best Regards,
> Joachim
>
>
>
> P.S.: I tried these -POS values with and without the bigtop repo rpms 
> installed. With more or less the same result:
>
> bigtop@a956005e4481:~> docker image ls
> REPOSITORY          TAG                   IMAGE ID            CREATED         
>     SIZE
> bigtop/slaves       trunk-centos-8        ba284fdb2b47        8 hours ago     
>     3.51GB
> bigtop/slaves       trunk-ubuntu-16.04    43c6ff57c166        8 hours ago     
>     3.76GB
> bigtop/slaves       trunk-centos-7        8809f6c53ef8        8 hours ago     
>     2.72GB
> bigtop/slaves       trunk-opensuse-42.3   e7246a926a13        7 months ago    
>     2.77GB
>
>
>
>
> P.P.S.: Yesterday I had one testrun that went quite a bit farther, but failed 
> with a java exception during building an rpm in trunk-opensuse-42.3. But I 
> cannot reproduce that anymore. It fails much earlier now.
>
>
>
> My test runs on a bare metal opensuse 15.1 with its docker package 
> (19.03.11), but the test itself runs in an opensuse 42.3 docker image to make 
> it easy to reproduce my environment:
>
> docker run --name bigtop-prep -v /var/run/docker.sock:/var/run/docker.sock -v 
> /usr/bin/docker:/usr/bin/docker -ti opensuse/leap:42.3 /bin/bash
> zypper in -y vim less
> zypper in -y tar curl unzip rpm-build java-1_8_0-openjdk-devel
> sed -i 's/en_US.UTF-8//' /etc/sysconfig/language
> groupadd -g `ls -l /var/run/docker.sock | while read x x x g x; do echo $g; 
> done` docker
> useradd -m -G docker bigtop
> zypper ar -G 
> https://artfiles.org/apache.org/bigtop/bigtop-1.4.0/repos/opensuse42.3/bigtop.repo
> zypper in -y which bigtop-utils bigtop-jsvc bigtop-tomcat bigtop-groovy
> cp -av  /usr/lib/bigtop-tomcat/conf.template /usr/lib/bigtop-tomcat/conf
> sed -i '/<\/tomcat-users>/i \ \ <role rolename="manager"/>' 
> /usr/lib/bigtop-tomcat/conf/tomcat-users.xml
> sed -i '/<\/tomcat-users>/i \ \ <role rolename="manager-gui"/>' 
> /usr/lib/bigtop-tomcat/conf/tomcat-users.xml
> sed -i '/<\/tomcat-users>/i \ \ <user username="tomcat" password="my-PW" 
> roles="tomcat,manager,manager-gui"/>' 
> /usr/lib/bigtop-tomcat/conf/tomcat-users.xml
> sed -i '/<\/tomcat-users>/i \ \ <user username="joachim" password="my-PW" 
> roles="tomcat,manager,manager-gui"/>' 
> /usr/lib/bigtop-tomcat/conf/tomcat-users.xml
> sed -i '/<\/tomcat-users>/i \ \ <user username="julian" password="his-PW" 
> roles="tomcat,manager,manager-gui"/>' 
> /usr/lib/bigtop-tomcat/conf/tomcat-users.xml
> /usr/lib/bigtop-tomcat/bin/startup.sh
> su - bigtop
> curl 
> https://apache.lauf-forum.at/bigtop/bigtop-1.4.0/bigtop-1.4.0-project.tar.gz 
> -o bigtop-1.4.0-project.tar.gz
> tar xf bigtop-1.4.0-project.tar.gz
> ln -sf bigtop-1.4.0 bigtop
> cd bigtop
> ./gradlew --console=plain hadoop-pkg-ind -POS=opensuse-42.3
>
>
>
>
> This is the complete output of the gradle run (other steps seem fine, tomcat 
> catalina is running: ports are open but makes no difference):
>
> bigtop@a956005e4481:~/bigtop> ./gradlew --console=plain hadoop-pkg-ind 
> -POS=opensuse-42.3
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
>  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
> 100 74.7M  100 74.7M    0     0  5839k      0  0:00:13  0:00:13 --:--:-- 5990k
> ~/.gradle/wrapper/dists/gradle-4.10.3-bin/016e637c7ef47db2d3e632a3936ca351 
> ~/bigtop
> ~/bigtop
> Welcome to Gradle 4.10.3!
> Here are the highlights of this release:
> - Incremental Java compilation by default
> - Periodic Gradle caches cleanup
> - Gradle Kotlin DSL 1.0-RC6
> - Nested included builds
> - SNAPSHOT plugin versions in the `plugins {}` block
> For more details see https://docs.gradle.org/4.10.3/release-notes.html
> Starting a Gradle Daemon (subsequent builds will be faster)
> > Task :hadoop-pkg-ind FAILED
> Building trunk hadoop-pkg on opensuse-42.3 in Docker...
> +++ dirname ./bigtop-ci/build.sh
> ++ cd ./bigtop-ci/..
> ++ pwd
> + BIGTOP_HOME=/home/bigtop/bigtop
> + '[' 6 -eq 0 ']'
> + [[ 6 -gt 0 ]]
> + key=--prefix
> + case $key in
> + PREFIX=trunk
> + shift
> + shift
> + [[ 4 -gt 0 ]]
> + key=--os
> + case $key in
> + OS=opensuse-42.3
> + shift
> + shift
> + [[ 2 -gt 0 ]]
> + key=--target
> + case $key in
> + TARGET=hadoop-pkg
> + shift
> + shift
> + [[ 0 -gt 0 ]]
> + '[' -z x ']'
> + '[' -z x ']'
> + '[' '' == true ']'
> + IMAGE_NAME=bigtop/slaves:trunk-opensuse-42.3
> ++ uname -m
> + ARCH=x86_64
> + '[' x86_64 '!=' x86_64 ']'
> ++ docker run -d bigtop/slaves:trunk-opensuse-42.3 /sbin/init
> + 
> CONTAINER_ID=b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be
> + trap 'docker rm -f 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be' EXIT + 
> docker cp /home/bigtop/bigtop 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop
> + docker cp /home/bigtop/bigtop/bigtop-ci/entrypoint.sh 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/entrypoint.sh
> Error: No such container:path: 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop
> + docker exec 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be bash -c 
> 'chown -R jenkins:jenkins /bigtop'
> + docker exec --user jenkins 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be bash -c 'cd 
> /bigtop && ./entrypoint.sh  hadoop-pkg --info'
> bash: line 0: cd: /bigtop: No such file or directory
> + RESULT=1
> + mkdir -p output
> + docker cp 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/build
>  .
> Error: No such container:path: 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/build
> + docker cp 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/output
>  .
> Error: No such container:path: 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be:/bigtop/output
> + docker rm -f 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be
> + '[' 1 -ne 0 ']'
> + exit 1
> + docker rm -f 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be
> Error: No such container: 
> b21178150cd1ebb8640eebe9e24e27cd9e1d65da4bcf0e22e6c4e725514231be
> FAILURE: Build failed with an exception.
> * Where:
> Script '/home/bigtop/bigtop-1.4.0/packages.gradle' line: 657
> * What went wrong:
> Execution failed for task ':hadoop-pkg-ind'.
> > Process 'command 'bash'' finished with non-zero exit value 1
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output. Run with --scan to get full insights.
> * Get more help at https://help.gradle.org
> BUILD FAILED in 30s
> 1 actionable task: 1 executed

Reply via email to