Have you tried hdfs balancer? http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer
On Fri, Feb 6, 2015 at 11:34 AM, Manoj Venkatesh <manove...@gmail.com> wrote: > Dear Hadoop experts, > > I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation > and 2 additional nodes were added later to increase disk and CPU capacity. > What i see is that processing is shared amongst all the nodes whereas the > storage is reaching capacity on the original 6 nodes whereas the newly > added machines have relatively large amount of storage still unoccupied. > > I was wondering if there is an automated or any way of redistributing data > so that all the nodes are equally utilized. I have checked for the > configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy* > have options 'Round Robin' or 'Available Space', are there any other > configurations which need to be reviewed. > > Thanks, > Manoj >