zhangyue19921010 commented on PR #7519: URL: https://github.com/apache/hudi/pull/7519#issuecomment-1360909416
> I guess this PR is related with https://github.com/apache/hudi/pull/7405/files, if the clsutering metadata files are archived but the replaced files are not cleaned, the query would see duplicates. Hi @danny0405 I think it have something related, but not aiming to solve the same issue. In HUDI-5341 is trying to solve incremental clean didn't clean all the replaced files as we expected which will causing data duplicate. In this PR, we are trying to have a new control for `KEEP_LATEST_VERSIONS` delete all the replaced files immediate which will cause downstream query failed. of cause we need to set this time carefully to make sure all replaced files are deleted before archive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org