Hi Xenia,
which host is a master depends on which process are running on it. The
SecondaryNameNode is in my view a master-process.
I have googled and found out, that I missused the master-file and told
you wrong about this. Check out this link:
http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
It means, in the master-file, you just put the hosts in, which will
start the SecondaryNameNode. The NameNode itself will be started on the
local machine, where start-dfs.sh will be executed. The JobTracker or
ResourceManager must be started also locally on the machine with
start-yarn.sh (for ResourceManager).
Take care to apply the ip-addresses in yarn-site.xml and distribute the
configuration.
The link above explains the SecondaryNameNode-stuff in more detail.
How you distribute the hadoop-processes on your machines depends on your
hardware-resources and estimated usage. If you need for example one day
more diskspace, then add the machine, where the NameNode or
ResourceManager is running, to the slaves file. The master-processes
itselfs dont need much diskspace for a 4-machine-cluster ;-). The
following link explaines, that they just store the filesystem-metadata.
http://hadooptutorial.info/tag/what-is-fsimage/
And btw your question matches more to the Hadoop-mailing-list - not the
Ghiraph one.
Best regards,
On 09.08.2014 21:33, Xenia Demetriou wrote:
Hi Alexander,
Thanks for your help. Also I am not an expert.
In my cluster (4 machines) I define as following:
In master file in all the machines I define two of the machines as
Master and SecondaryNameNode
And in slave file in all the machines, I define the other two machines
as Datanode1 and DataNode2.
I don't know if Master and SecondaryNameNode can also defined as
slaves or if it is better to define the SecondaryNameNode as slave
instead of master.
Thanks