Xiangyi Zhu created HDFS-17191:
----------------------------------

             Summary: HDFS: Delete operation adds a thread to collect blocks 
asynchronously
                 Key: HDFS-17191
                 URL: https://issues.apache.org/jira/browse/HDFS-17191
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs
    Affects Versions: 3.4.0
            Reporter: Xiangyi Zhu
            Assignee: Xiangyi Zhu


When we delete a large directory, it is time-consuming to collect the blocks in 
the deleted subtree. Currently, block collection is executed within a write 
lock. If a large directory is deleted, other RPCs may be blocked for a period 
of time. Asynchronous deletion of collected blocks has been implemented, we can 
refer to this.

In fact, collecting blocks does not require locking, because after the subtree 
is deleted, this subtree will not be accessed by other RPCs. We can collect the 
deleted subtree asynchronously and without locking.
But there may be some problems:
1. When the parent node of the subtree is configured with quota, the quota 
update is not synchronous and there will be a small delay.
2. Because the root directory always has the DirectoryWithQuotaFeature 
attribute, we need to update the quotaUsage of the root directory anyway. In 
addition, the root directory does not have an upper limit for quota 
configuration. I think we can ignore the delayed update of quota for the root 
directory.

To solve the above problem, we can check whether all parent directories of the 
subtree are configured with quota. If quota is not configured, use asynchronous 
collection. We can also use configuration to let users decide whether to enable 
quota checking.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to