I'm running Hadoop 1.1.2 on a cluster with 10ish computers. I would like to nicely add and remove nodes, both for HDFS and MapReduce.
I've noticed the *datanode* process dies once decomissioning is done, so this is what I do to remove a node: - Add node to *mapred.exclude* - Add node to *hdfs.exclude* - $ hadoop mradmin -refreshNodes - $ hadoop dfsadmin -refreshNodes - $ hadoop-daemon.sh stop tasktracker To add athe node back in (assuming it was removed like above): - Remove from *mapred.exclude* - Remove from *hdfs.exclude* - $ hadoop mradmin -refreshNodes - $ hadoop dfsadmin -refreshNodes - $ hadoop-daemon.sh start tasktracker - $ hadoop-daemon.sh start datanode Is this the correct way to scale up and down "nicely"? By "nicely", I mean without data loss, and without stopping tasks running on the nodes that I'm removing. (I.e. I'm assuming that *$ hadoop-daemon.sh stop tasktracker* lets the tasktracker finish any currently running tasks before dying). Thanks, Philippe
