Hi Xenia,

which host is a master depends on which process are running on it. The SecondaryNameNode is in my view a master-process.

I have googled and found out, that I missused the master-file and told you wrong about this. Check out this link:

http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/

It means, in the master-file, you just put the hosts in, which will start the SecondaryNameNode. The NameNode itself will be started on the local machine, where start-dfs.sh will be executed. The JobTracker or ResourceManager must be started also locally on the machine with start-yarn.sh (for ResourceManager).

Take care to apply the ip-addresses in yarn-site.xml and distribute the configuration.

The link above explains the SecondaryNameNode-stuff in more detail.

How you distribute the hadoop-processes on your machines depends on your hardware-resources and estimated usage. If you need for example one day more diskspace, then add the machine, where the NameNode or ResourceManager is running, to the slaves file. The master-processes itselfs dont need much diskspace for a 4-machine-cluster ;-). The following link explaines, that they just store the filesystem-metadata.

http://hadooptutorial.info/tag/what-is-fsimage/

And btw your question matches more to the Hadoop-mailing-list - not the Ghiraph one.

Best regards,


On 09.08.2014 21:33, Xenia Demetriou wrote:
Hi Alexander,

Thanks for your help. Also I am not an expert.

 In my cluster (4 machines) I define as following:

In master file in all the machines I define two of the machines as Master and SecondaryNameNode And in slave file in all the machines, I define the other two machines as Datanode1 and DataNode2.

I don't know if Master and SecondaryNameNode can also defined as slaves or if it is better to define the SecondaryNameNode as slave instead of master.

Thanks




Reply via email to