Looking for documentation/guides on Hadoop 2.7.2

Mike Wenzel Thu, 09 Jun 2016 02:16:00 -0700

Hey everyone. I just started some weeks ago to learn about Hadoop. I got the 
task to understand the Hadoop Ecosystem, and be able to answer some questions. 
First of all I started reading a book "OReilly - Hadoop The Definitive Guide". 
After reading the book I had a first idea of how components work together, but 
for me the book didn't helped me to understand what's going on. In my opinion 
the book described pretty much general in depth details about various 
components. This didn't helped me to understand the Hadoop Ecosystem.

I started to work with it. I installed a VM (SUSE Leap 42.1) and followed the
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
Guide.
After doing this I started to work with files on it. I wrote my first simple
mapper and reducer, and I analyzed my apache log for some testing. This worked
good so far.

But let's face my problems:
1) All my knowledge about the Installing of Hadoop right now is: Unpacking a
.tar.gz. I ran some shell-scripts and everything was running fine. Well, I have
no clue at all, which components are now installed on the VM and where are they
located and installed?

2) Furthermore, I'm missing all kinds of information about setting those up.
The apache guide on some point says "Now check that you can ssh to the
localhost without a passphrase" "If you cannot ssh to localhost without a
passphrase, execute the following commands:". Well, I'd like to know what am I
doing here ?! I mean WHY do I need ssh running on localhost, and WHY do this
have to be without a passphrase. Which other ways of configuring this do exists?

3) Same on the next point: "The following instructions are to run a MapReduce
job locally. If you want to execute a job on YARN, see YARN on Single Node."
"Format the filesystem: $ bin/hdfs namenode -format". I have no clue how HDFS
internally work. For me a Filesystem is where I can setup partitions hooked on
folders. So how am I supposed to explain hdfs to someone else?
I understood the storing of data, splitting files in blocks, spread files
around the cluster, store metadata, but if someone asks me: "How can this be
called filesystem if you install it by unpacking a .tar.gz?" I simply can't
answer this question in any way.

So I'm now looking for a documentation/guide for:
- Which requirements do I have?
-- Does I have to use a specific Filesystem? If yes/no, why or what would you
recommend?
-- How should I partition my VM?
-- On which partition should I install which components?
- Setting up a VM with Hadoop
- Configure Hadoop step by step
- Setup all kinds of deamons/nodes manually and explain where are they located
(how they work) and how they should be configured

I'm right now reading:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
but after some first readings this Guide will tell you what to write in which
configuration-file, but now why you should do this or not. I'm feeling like
"leaved alone in the darkness" after getting an idea of what Hadoop is. I hope
some of you can show me some ways to get back om the road.
For me it's very important not just to write some configuration somewhere. I
need to understand what's going on because if I got a running cluster and
things, I need to be sure to handle all this stuff before going in productive
use with it.

Best Regards
Mike

Looking for documentation/guides on Hadoop 2.7.2

Reply via email to