Hi all,

I made a proposal about adding a Spark Procedure
RemoveDanglingDeleleteFiles.   It would do a more comprehensive job to
remove Delete Files that stay around after they become invalid (stop
applying to Data Files), which happens in some cases, taking up storage and
potentially affecting performance.

Links:

   - Iceberg github issue:  https://github.com/apache/iceberg/issues/6126
   - Design Doc:
   
https://docs.google.com/document/d/11d-cIUR_89kRsMmWnEoxXGZCvp7L4TUmPJqUC60zB5M/edit?usp=sharing
   
<https://docs.google.com/document/d/11d-cIUR_89kRsMmWnEoxXGZCvp7L4TUmPJqUC60zB5M/edit?usp=sharing>

Part of the proposal depends on adding the PositionDeletes metadata table
that is still pending some refactoring ( described in the Github issue)

But please take a look at the proposal, and feel free to comment.

Thanks,
Szehon

Reply via email to