zhangyue19921010 commented on PR #7519:
URL: https://github.com/apache/hudi/pull/7519#issuecomment-1360909416

   > I guess this PR is related with 
https://github.com/apache/hudi/pull/7405/files, if the clsutering metadata 
files are archived but the replaced files are not cleaned, the query would see 
duplicates.
   
   Hi @danny0405 I think it have something related, but not aiming to solve the 
same issue.
   In HUDI-5341 is trying to solve incremental clean didn't clean all the 
replaced files as we expected which will causing data duplicate.
   
   In this PR, we are trying to have a new control for `KEEP_LATEST_VERSIONS` 
delete all the replaced files immediate which will cause downstream query 
failed.
   
   of cause we need to set this time carefully to make sure all replaced files 
are deleted before archive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to