Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-10 Thread via GitHub
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219941921 @xicm @danny0405 yes, I will submit a new pr to fix it recently -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-10 Thread via GitHub
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219938801 @danny0405 @KnightChess Shall we use the new algorithm? new algorithm looks simpler. The previous implementation had an overflow problem,need a fix? -- This is an automated message from

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-10 Thread via GitHub
danny0405 merged PR #11578: URL: https://github.com/apache/hudi/pull/11578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… URL: https://github.com/apache/hudi/pull/11578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
danny0405 commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219523986 > Both algorithms have drawbacks. @xicm That's fine, the new algorithm looks simpler, there is no need to distinguish between different parallelisms. -- This is an automated

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… URL: https://github.com/apache/hudi/pull/11578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219457551 don't know why this check will contain docker moudle, other success look like not contain, retrigger again

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219424072 ## CI report: * d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219339643 > @xicm @KnightChess So we reach concensus the algorithm raised by @KnightChess is better? If that's true, let's fire a fix in a separate PR. Both algorithms have drawbacks. For

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219319141 ## CI report: * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219311546 ## CI report: * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
danny0405 commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219306922 @xicm @KnightChess So we reach concensus the algorithm raised by @KnightChess is better? If that's true, let's fire a fix in a separate PR. -- This is an automated message from the

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219304171 > @xicm no, although fixing the overflow problem, the old will not be better, you can try the ut. I have tried before. oh, there's something wrong with my test case , the old algorithm

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219288802 @danny0405 I have tried before, the result is the new algorithm better. I will fix it in a separate pr. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219285300 @xicm no, although fixing the overflow problem, the old will not be better, you can try the ut. I have tried before. -- This is an automated message from the Apache Git Service. To

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
danny0405 commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219159512 > if we fix the overflow problem, the old algorithm is better. Let's fire a fix for it, and @KnightChess let's keep the Flink hashing algorithm the same as it is and we can

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2217636975 > > @danny0405 @xicm From the discussion results and unit test situations, we can conclude that in the case of consecutive partitions, the new algorithm is more stable than the old

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
xicm commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2217579782 > @danny0405 @xicm From the discussion results and unit test situations, we can conclude that in the case of consecutive partitions, the new algorithm is more stable than the old algorithm.

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2217454238 both of these algorithms are better than the original spark bulk bucket partitioner algorithm. I think they can both address the skew issue to some extent. If we want to maintain the

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-09 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1670349405 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669629978 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669629978 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669586866 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669582211 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669547032 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669547032 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
danny0405 commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1669498553 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214663418 ## CI report: * 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214473532 ## CI report: * 52fe39ea0e547ea47a41e7eec4d7c80a412f9576 Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214454921 ## CI report: * 52fe39ea0e547ea47a41e7eec4d7c80a412f9576 Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214270290 ## CI report: * 52fe39ea0e547ea47a41e7eec4d7c80a412f9576 Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214243451 ## CI report: * ffbc9db9f2d56bb137d46371082fa75aeca7b1fc Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668712958 ## hudi-common/src/test/java/org/apache/hudi/common/util/hash/TestBucketIndexUtil.java: ## @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668708594 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668707261 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668222026 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668222026 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668042192 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-08 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1668042192 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2213055310 ## CI report: * ffbc9db9f2d56bb137d46371082fa75aeca7b1fc Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667985370 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667985370 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667985370 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2212932408 ## CI report: * 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2212923538 ## CI report: * 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667892464 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667874389 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
xicm commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667869456 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-07 Thread via GitHub
danny0405 commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667817477 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596861 ## hudi-common/src/test/java/org/apache/hudi/common/util/hash/TestBucketIndexUtil.java: ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667594885 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
danny0405 commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667585339 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667358016 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… URL: https://github.com/apache/hudi/pull/11578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-06 Thread via GitHub
KnightChess commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667325508 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-05 Thread via GitHub
danny0405 commented on code in PR #11578: URL: https://github.com/apache/hudi/pull/11578#discussion_r1667210143 ## hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java: ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-05 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2211147856 ## CI report: * 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-05 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2211073160 ## CI report: * 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:

Re: [PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-05 Thread via GitHub
hudi-bot commented on PR #11578: URL: https://github.com/apache/hudi/pull/11578#issuecomment-2211062908 ## CI report: * 91944ec23245ca7389fb2e36f3d96fd255a6d77a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

[PR] [HUDI-7957] fix data skew when writing with bulk_insert + bucket_inde… [hudi]

2024-07-05 Thread via GitHub
KnightChess opened a new pull request, #11578: URL: https://github.com/apache/hudi/pull/11578 …x enabled ### Change Logs - imporve `BucketIndexUtil` partitionIndex algorithm make the data be evenly distributed. - `BucketPartitionUtils` in spark use `BucketIndexUtil` method,