Improve Trash Emptier
---------------------

                 Key: HDFS-1144
                 URL: https://issues.apache.org/jira/browse/HDFS-1144
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Dmytro Molkov
            Assignee: Dmytro Molkov


There are two inefficiencies in the Trash functionality right now that have 
caused some problems for us.

First if you configured your trash interval to be one day (24 hours) that means 
that you store 2 days worth of data eventually. The Current and the previous 
timestamp that will not be deleted until the end of the interval.
And another problem is accumulating a lot of data in Trash before the Emptier 
wakes up. If there are a couple of million files trashed and the Emptier does 
deletion the NameNode will freeze until everything is removed. (this particular 
problem hopefully will be addressed with HDFS-1143).

My proposal is to have two configuration intervals. One for deleting the 
trashed data and another for checkpointing. This way for example for intervals 
of one day and one hour we will only store 25 hours of data instead of 48 right 
now and the deletions will be happening in smaller chunks every hour of the day 
instead of a huge deletion at the end of the day now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to