Hi all, I made a proposal about adding a Spark Procedure RemoveDanglingDeleleteFiles. It would do a more comprehensive job to remove Delete Files that stay around after they become invalid (stop applying to Data Files), which happens in some cases, taking up storage and potentially affecting performance.
Links: - Iceberg github issue: https://github.com/apache/iceberg/issues/6126 - Design Doc: https://docs.google.com/document/d/11d-cIUR_89kRsMmWnEoxXGZCvp7L4TUmPJqUC60zB5M/edit?usp=sharing <https://docs.google.com/document/d/11d-cIUR_89kRsMmWnEoxXGZCvp7L4TUmPJqUC60zB5M/edit?usp=sharing> Part of the proposal depends on adding the PositionDeletes metadata table that is still pending some refactoring ( described in the Github issue) But please take a look at the proposal, and feel free to comment. Thanks, Szehon