zyclove commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1875068526
https://github.com/apache/hudi/assets/15028279/d9fcbefc-4fde-4275-977a-7d92b7ff2623";>
https://github.com/apache/hudi/assets/15028279/5e1837ed-d726-436c-aa43-9e41f5a6d7fd";>
`
beyond1920 commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1875017243
@zyclove
Guess the previous writer jobs used simple bucket index, and the latest
writer jobs did not.
It leads to data deduplication, because records with same primary key v
zyclove commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1873203672
@beyond1920 @ad1happy2go I must set HoodieIndexConfig.INDEX_TYPE.key to
BUCKET ? Previously it was SIMPLE mode.
DataSourceReadOptions.EXTRACT_PARTITION_VALUES_FROM_PARTITION_PAT
ad1happy2go commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1871874792
@zyclove Also, Any reason why you are setting
DataSourceReadOptions.EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH. This config
will extract the partition values from physical parti
zyclove commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870965829
![image](https://github.com/apache/hudi/assets/15028279/18a9b0c2-758d-41a2-bf54-9c2db5733df3)
@beyond1920 @parisni @danny0405 @nsivabalan
In addition, it is very strange
zyclove commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870952480
@parisni @beyond1920
Thank you very much for helping me take a look. The code changes before and
after the upgrade are as follows. Is there any good way to merge layers into
new f
beyond1920 commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870862338
![image](https://github.co@zyclove
/apache/hudi/assets/1525333/9083eb26-71fd-4656-9c25-c0374fc7ccf2)
@zyclove Data deduplication caused by records with same primary key value
ar
parisni commented on issue #10407:
URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870156940
can you share with us the hoodie configs involved in your ingestion ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
zyclove opened a new issue, #10407:
URL: https://github.com/apache/hudi/issues/10407
**Describe the problem you faced**
For a Hudi (0.12.3) table with existing data, upgrade to 0.14.0. After the
upgrade, it is found that the data is duplicated. Check whether the old data
file sti