He Tianyi created HDFS-10290: -------------------------------- Summary: Move getBlocks calls to DataNode in Balancer Key: HDFS-10290 URL: https://issues.apache.org/jira/browse/HDFS-10290 Project: Hadoop HDFS Issue Type: New Feature Components: balancer & mover Affects Versions: 2.6.0 Reporter: He Tianyi
In current implementation, Balancer asks NameNode for a list of blocks on specific DataNode. This made workload of NameNode heavier, and actually it caused NameNode flappy when average # of blocks on each DataNode reaches 1,000,000 (NameNode heap size is 192GB, cpu: Xeon E5-2630 * 2). Recently I investigated whether {{getBlocks}} invocation from Balancer can be handled by DataNodes, turned out to be practical. The only pitfall is: since DataNode has no information about other locations of each block it possesses, some block move may fail (since target node may already has a replica of that particular block). I think this may be beneficial for large clusters. Any suggestions or comments? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)