JiangHua Zhu created HDFS-16614:
-----------------------------------

             Summary: Improve balancer operation strategy and performance
                 Key: HDFS-16614
                 URL: https://issues.apache.org/jira/browse/HDFS-16614
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: balancer & mover, namenode
    Affects Versions: 3.3.0
            Reporter: JiangHua Zhu
         Attachments: image-2022-06-02-13-18-33-213.png

When the Balancer program is run, it does some work in the following order:
1. Obtain available datanode information from NameNode.
2. Classify and calculate the average utilization according to StorageType. 
Here, some sets will be obtained in combination with the set thresholds: 
overUtilized, aboveAvgUtilized, belowAvgUtilized, and underUtilized.
3. According to some calculations, the source and target related to the 
transfer data are obtained. The source is used for the source end, and the 
target is used for the data receiving end.
4. Start the data transfer work in parallel.
In this process, run iteratively. In this process, the threshold is unified and 
applied to all StorageTypes, which seems to be a bit rough, because one of the 
StorageTypes cannot be distinguished, which is based on the currently supported 
heterogeneous storage.

There is an online cluster with more than 2000 nodes, and there is an imbalance 
in node storage. E.g:
 !image-2022-06-02-13-18-33-213.png! 

Here, the average utilization of the cluster is 78%, but the utilization of 
most nodes is between 85% and 90%. When the balancer is turned on, we find that 
85% of the nodes are working as sources. In this case, we think it is not 
reasonable, because it will occupy more network resources in the cluster, and 
it will be beneficial to the normal work of the cluster to do some effective 
restrictions.
So here are some changes to make:
1. When the balancer is running, it should try to prompt the threshold related 
to StorageType. For example [[DISK, 10%], [SSD, 8%]...]
2. Support to set threshold according to StorageType and work.
3. Add an option to prohibit nodes below the threshold from joining the Source 
set. This is to allow nodes with high utilization to transfer data as soon as 
possible, which is good for balance.
4. Add new support. If there are a lot of datanode usage in the cluster, it 
should remain unchanged. For example, the utilization rate of 40% of the nodes 
in the cluster is 75% to 80%, and these nodes should not join the Source set. 
Of course this support needs to be specified by the user at runtime.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to