Run the “hadoop balencer” command on the namenode. It’s is used for balancing skewed data. http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
On Aug 6, 2014, at 1:45 PM, Brian C. Huffman <bhuff...@etinternational.com> wrote: > All, > > We currently a Hadoop 2.2.0 cluster with the following characteristics: > - 4 nodes > - Each node is a datanode > - Each node has 3 physical disks for data: 2 x 500GB and 1 x 2TB disk. > - HDFS replication factor of 3 > > It appears that our 500GB disks are filling up first (the alternative would > be to put 4 times the number of blocks on the 2TB disks per node). I'm > concerned that once the 500GB disks fill, our performance will slow down > (less spindles being read / written at the same time per node). Is this > correct? Is there anything we can do to change this behavior? > > Thanks, > Brian > >