Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2024-01-03 Thread via GitHub
zyclove commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1875068526 https://github.com/apache/hudi/assets/15028279/d9fcbefc-4fde-4275-977a-7d92b7ff2623";> https://github.com/apache/hudi/assets/15028279/5e1837ed-d726-436c-aa43-9e41f5a6d7fd";> `

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2024-01-03 Thread via GitHub
beyond1920 commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1875017243 @zyclove Guess the previous writer jobs used simple bucket index, and the latest writer jobs did not. It leads to data deduplication, because records with same primary key v

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-31 Thread via GitHub
zyclove commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1873203672 @beyond1920 @ad1happy2go I must set HoodieIndexConfig.INDEX_TYPE.key to BUCKET ? Previously it was SIMPLE mode. DataSourceReadOptions.EXTRACT_PARTITION_VALUES_FROM_PARTITION_PAT

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-29 Thread via GitHub
ad1happy2go commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1871874792 @zyclove Also, Any reason why you are setting DataSourceReadOptions.EXTRACT_PARTITION_VALUES_FROM_PARTITION_PATH. This config will extract the partition values from physical parti

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-28 Thread via GitHub
zyclove commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870965829 ![image](https://github.com/apache/hudi/assets/15028279/18a9b0c2-758d-41a2-bf54-9c2db5733df3) @beyond1920 @parisni @danny0405 @nsivabalan In addition, it is very strange

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-28 Thread via GitHub
zyclove commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870952480 @parisni @beyond1920 Thank you very much for helping me take a look. The code changes before and after the upgrade are as follows. Is there any good way to merge layers into new f

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-27 Thread via GitHub
beyond1920 commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870862338 ![image](https://github.co@zyclove /apache/hudi/assets/1525333/9083eb26-71fd-4656-9c25-c0374fc7ccf2) @zyclove Data deduplication caused by records with same primary key value ar

Re: [I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-27 Thread via GitHub
parisni commented on issue #10407: URL: https://github.com/apache/hudi/issues/10407#issuecomment-1870156940 can you share with us the hoodie configs involved in your ingestion ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] [SUPPORT]0.12.3 upgrade to 0.14.0 data duplication [hudi]

2023-12-25 Thread via GitHub
zyclove opened a new issue, #10407: URL: https://github.com/apache/hudi/issues/10407 **Describe the problem you faced** For a Hudi (0.12.3) table with existing data, upgrade to 0.14.0. After the upgrade, it is found that the data is duplicated. Check whether the old data file sti