Xiangyi Zhu created HDFS-17191: ---------------------------------- Summary: HDFS: Delete operation adds a thread to collect blocks asynchronously Key: HDFS-17191 URL: https://issues.apache.org/jira/browse/HDFS-17191 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Assignee: Xiangyi Zhu
When we delete a large directory, it is time-consuming to collect the blocks in the deleted subtree. Currently, block collection is executed within a write lock. If a large directory is deleted, other RPCs may be blocked for a period of time. Asynchronous deletion of collected blocks has been implemented, we can refer to this. In fact, collecting blocks does not require locking, because after the subtree is deleted, this subtree will not be accessed by other RPCs. We can collect the deleted subtree asynchronously and without locking. But there may be some problems: 1. When the parent node of the subtree is configured with quota, the quota update is not synchronous and there will be a small delay. 2. Because the root directory always has the DirectoryWithQuotaFeature attribute, we need to update the quotaUsage of the root directory anyway. In addition, the root directory does not have an upper limit for quota configuration. I think we can ignore the delayed update of quota for the root directory. To solve the above problem, we can check whether all parent directories of the subtree are configured with quota. If quota is not configured, use asynchronous collection. We can also use configuration to let users decide whether to enable quota checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org