Hi, We have a requirement to essentially expire temporary files that are no longer needed in an HDFS share. I have noticed some traffic on this very same issue and was wondering how best to approach the problem and/or contribute. Basically, we need to remove a user specified subset of files from HDFS based on mtime or atime.
Possible Approaches: - Mount HDFS over FUSE and use standard tmpreaper utility. - Implement a Hadoop version of tmpreaper using FileSystem, and PathFilter classes. - Place temporary files in .Trash like directory and use Trash classes checkpoint and expunge methods. It would be nice here if the user could choose to expire all checkpoints except the N most recent checkpoints, or incrementally expire checkpoints to free up space. Thanks for the feedback, Michael