Hi,
I have deployed a multi-node cluster with one master and two data nodes.
Here's what jps shows:
hadoop@hadoop-master:~$ jps
24641 SecondaryNameNode
24435 DataNode
24261 NameNode
24791 ResourceManager
25483 Jps
24940 NodeManager
hadoop@hadoop-data1:~$ jps
15556 DataNode
16198 NodeManager
I'm not sure if this is related, but I'm seeing some errors
in hadoop-hadoop-namenode-hadoop-master.log
2015-09-23 19:56:27,798 WARN
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager:
Unresolved datanode registration: hostname cannot be resolved
(ip=192.168.51.1,
Hi Daniel,
The RM will list only NodeManagers and not the datanodes. You can view the
datanodes on the NameNode page (eg. 192.168.51.4:50070).
The one node you see on the RM page 'Nodes' list is from this:
hadoop@hadoop-master:~$ jps24641 SecondaryNameNode24435 DataNode24261
NameNode24791
I was able to get the jobs submitting to the cluster by adding the
following property to mapred-site.xml
mapreduce.framework.name
yarn
I also had to add the following properties to yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
hey Rahul,
thanks for pointing me to that page. It's definately worth a read. Need
both clusters to be at least V2.3 for that?
I was digging also a little bit further. There is the property setting
fs.defaultFS whchi might be the exact setting I was actually looking for.
Unfortuantely MapR
hey everyone,
MapR is offering the possibility to acces from one cluster (e.g. a compute
only cluster without much storage capabilities) another cluster's
HDFS/MapRFS (see http://doc.mapr.com/display/MapR/mapr-clusters.conf). In
times of Hadoop-as-a-Service this becomes very interesting. Is this
Nothing is stopping you to implement cluster the way you want.
You can have storage only nodes for your HDFS and do not run tasktrackers
on them.
Start bunch of machines with High RAM and high CPUs but no storage.
Only thing to worry then would be network bandwidth to carry data from hdfs
to
Hey Nitin,
I'm not talking about concept-wise. I'm takling about how to actually do it
technically and how to set it up. Imagine this: I have two clusters, both
running fine and they are both (setup-wise) the same, besides that one has
way more tasktrackers/Nodemanagers than the other one. Now I
Fabian,
I see this as the classic case of federation of hadoop clusters. The MR
or job can refer to the specific hdfs://file location as input but at the
same time run on another cluster.
You can refer to following link for further details on federation.