subject:"\[GitHub\] \[hudi\] parisni commented on issue #9026\: \[SUPPORT\] Duplicated partitions rows in MDT when reading w\/ datasource"

[GitHub] [hudi] parisni commented on issue #9026: [SUPPORT] Duplicated partitions rows in MDT when reading w/ datasource

2023-07-11 Thread via GitHub

parisni commented on issue #9026: URL: https://github.com/apache/hudi/issues/9026#issuecomment-1631433026 > My suspect is that the HFile itself does not contain duplicates, but either the merging or the MOR snapshot relation in Spark has issue, causing the duplicates Agreed. Duplicat

[GitHub] [hudi] parisni commented on issue #9026: [SUPPORT] Duplicated partitions rows in MDT when reading w/ datasource

2023-07-09 Thread via GitHub

parisni commented on issue #9026: URL: https://github.com/apache/hudi/issues/9026#issuecomment-1627681838 @yihua > If the metadata table is queried through Spark datasource directly after MDT compaction (i.e., no additional log file in the latest file slice), there is no duplicate.