parisni commented on issue #9026:
URL: https://github.com/apache/hudi/issues/9026#issuecomment-1631433026
> My suspect is that the HFile itself does not contain duplicates, but
either the merging or the MOR snapshot relation in Spark has issue, causing the
duplicates
Agreed. Duplicat
parisni commented on issue #9026:
URL: https://github.com/apache/hudi/issues/9026#issuecomment-1627681838
@yihua
> If the metadata table is queried through Spark datasource directly after
MDT compaction (i.e., no additional log file in the latest file slice), there
is no duplicate.