Hi Danny, This does depend on a number of circumstances, mostly based on file permissions. If for example a file is deleted without the -skipTrash option then it will be moved to the .Trash directory. From here it could be read, but the original file permissions will be preserved. Therefore if a user did not have read access before it was deleted then it won’t be able to read it from .Trash and if they did have read access then this ought to remain the case.
If a file is deleted then the blocks are marked for deletion by the namenode and won’t be available through HDFS, but there will be some lag between the HDFS delete operation and the block files being removed from the datanodes. It’s possible that someone could read the block from the datanode file system directly, but not through the HDFS file system. The blocks will exist on disk until the datanode itself deletes them. The way HDFS works you won’t get previous data when you create a new block since unallocated spaces doesn’t exist in the same way as it does on a regular file system. Each HDFS block maps to a file on the datanodes and block files can be an arbitrary size, unlike the fixed block/extent size of a regular file system. You don’t “reuse" HDFS blocks, a block in HDFS is just a file on the data node. You could potentially recover data from unallocated space on the datanode disk the same way you would for any other deleted file. If you want to remove the chance of data recovery on HDFS then encrypting the blocks using HDFS transparent encryption is the way to do it. They encryption keys reside in the namenode metadata so once they are deleted the data in that file is effectively lost. Beware of snapshots though since a deleted file in the live HDFS view may exist in a previous snapshot. Kind regards, Jim > On 11 Jan 2024, at 21:50, Daniel Howard <danny...@toldme.com> wrote: > > Is it possible for a user with HDFS access to read the contents of a file > previously deleted by a different user? > > I know a user can employ KMS to encrypt files with a personal key, making > this sort of data leakage effectively impossible. But, without KMS, is it > possible to allocate a file with uninitialized data, and then read the data > that exists on the underlying disk? > > Thanks, > -danny > > -- > http://dannyman.toldme.com <http://dannyman.toldme.com/>