Re: Hadoop Multi Node Cluster Configuration

Alexander Sirotin Sat, 09 Aug 2014 15:58:30 -0700

Hi Xenia,

which host is a master depends on which process are running on it. TheSecondaryNameNode is in my view a master-process.

I have googled and found out, that I missused the master-file and toldyou wrong about this. Check out this link:


http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

It means, in the master-file, you just put the hosts in, which willstart the SecondaryNameNode. The NameNode itself will be started on thelocal machine, where start-dfs.sh will be executed. The JobTracker orResourceManager must be started also locally on the machine withstart-yarn.sh (for ResourceManager).

Take care to apply the ip-addresses in yarn-site.xml and distribute theconfiguration.


The link above explains the SecondaryNameNode-stuff in more detail.

How you distribute the hadoop-processes on your machines depends on yourhardware-resources and estimated usage. If you need for example one daymore diskspace, then add the machine, where the NameNode orResourceManager is running, to the slaves file. The master-processesitselfs dont need much diskspace for a 4-machine-cluster ;-). Thefollowing link explaines, that they just store the filesystem-metadata.


http://hadooptutorial.info/tag/what-is-fsimage/

And btw your question matches more to the Hadoop-mailing-list - not theGhiraph one.


Best regards,


On 09.08.2014 21:33, Xenia Demetriou wrote:

Hi Alexander,

Thanks for your help. Also I am not an expert.

 In my cluster (4 machines) I define as following:
In master file in all the machines I define two of the machines asMaster and SecondaryNameNodeAnd in slave file in all the machines, I define the other two machinesas Datanode1 and DataNode2.
I don't know if Master and SecondaryNameNode can also defined asslaves or if it is better to define the SecondaryNameNode as slaveinstead of master.
Thanks

Re: Hadoop Multi Node Cluster Configuration

Reply via email to