KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219941921
@xicm @danny0405 yes, I will submit a new pr to fix it recently
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219938801
@danny0405 @KnightChess Shall we use the new algorithm? new algorithm looks
simpler. The previous implementation had an overflow problem,need a fix?
--
This is an automated message from
danny0405 merged PR #11578:
URL: https://github.com/apache/hudi/pull/11578
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail:
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing
with bulk_insert + bucket_inde…
URL: https://github.com/apache/hudi/pull/11578
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
danny0405 commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219523986
> Both algorithms have drawbacks.
@xicm That's fine, the new algorithm looks simpler, there is no need to
distinguish between different parallelisms.
--
This is an automated
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing
with bulk_insert + bucket_inde…
URL: https://github.com/apache/hudi/pull/11578
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219457551
don't know why this check will contain docker moudle, other success look
like not contain, retrigger again
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219424072
## CI report:
* d9c0ce277a202dc66f56b40418b4746fdcb6e1b6 Azure:
xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219339643
> @xicm @KnightChess So we reach concensus the algorithm raised by
@KnightChess is better? If that's true, let's fire a fix in a separate PR.
Both algorithms have drawbacks.
For
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219319141
## CI report:
* 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219311546
## CI report:
* 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure:
danny0405 commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219306922
@xicm @KnightChess So we reach concensus the algorithm raised by
@KnightChess is better? If that's true, let's fire a fix in a separate PR.
--
This is an automated message from the
xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219304171
> @xicm no, although fixing the overflow problem, the old will not be
better, you can try the ut. I have tried before.
oh, there's something wrong with my test case , the old algorithm
KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219288802
@danny0405 I have tried before, the result is the new algorithm better. I
will fix it in a separate pr.
--
This is an automated message from the Apache Git Service.
To respond to
KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219285300
@xicm no, although fixing the overflow problem, the old will not be better,
you can try the ut. I have tried before.
--
This is an automated message from the Apache Git Service.
To
danny0405 commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2219159512
> if we fix the overflow problem, the old algorithm is better.
Let's fire a fix for it, and @KnightChess let's keep the Flink hashing
algorithm the same as it is and we can
xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2217636975
> > @danny0405 @xicm From the discussion results and unit test situations,
we can conclude that in the case of consecutive partitions, the new algorithm
is more stable than the old
xicm commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2217579782
> @danny0405 @xicm From the discussion results and unit test situations, we
can conclude that in the case of consecutive partitions, the new algorithm is
more stable than the old algorithm.
KnightChess commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2217454238
both of these algorithms are better than the original spark bulk bucket
partitioner algorithm. I think they can both address the skew issue to some
extent. If we want to maintain the
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1670349405
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669629978
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669629978
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669586866
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669582211
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669547032
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669547032
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
danny0405 commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1669498553
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214663418
## CI report:
* 724e93b42446df3bdbc5e66a898f3b21bac97f3d Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214473532
## CI report:
* 52fe39ea0e547ea47a41e7eec4d7c80a412f9576 Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214454921
## CI report:
* 52fe39ea0e547ea47a41e7eec4d7c80a412f9576 Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214270290
## CI report:
* 52fe39ea0e547ea47a41e7eec4d7c80a412f9576 Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2214243451
## CI report:
* ffbc9db9f2d56bb137d46371082fa75aeca7b1fc Azure:
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668712958
##
hudi-common/src/test/java/org/apache/hudi/common/util/hash/TestBucketIndexUtil.java:
##
@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668708594
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668707261
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668222026
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668222026
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668042192
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1668042192
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2213055310
## CI report:
* ffbc9db9f2d56bb137d46371082fa75aeca7b1fc Azure:
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667985370
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667985370
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667985370
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2212932408
## CI report:
* 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2212923538
## CI report:
* 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667892464
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667874389
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
xicm commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667869456
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
danny0405 commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667817477
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596861
##
hudi-common/src/test/java/org/apache/hudi/common/util/hash/TestBucketIndexUtil.java:
##
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667596159
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667594885
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
danny0405 commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667585339
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667358016
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
KnightChess closed pull request #11578: [HUDI-7957] fix data skew when writing
with bulk_insert + bucket_inde…
URL: https://github.com/apache/hudi/pull/11578
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
KnightChess commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667325508
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF)
danny0405 commented on code in PR #11578:
URL: https://github.com/apache/hudi/pull/11578#discussion_r1667210143
##
hudi-common/src/main/java/org/apache/hudi/common/util/hash/BucketIndexUtil.java:
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2211147856
## CI report:
* 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2211073160
## CI report:
* 91944ec23245ca7389fb2e36f3d96fd255a6d77a Azure:
hudi-bot commented on PR #11578:
URL: https://github.com/apache/hudi/pull/11578#issuecomment-2211062908
## CI report:
* 91944ec23245ca7389fb2e36f3d96fd255a6d77a UNKNOWN
Bot commands
@hudi-bot supports the following commands:
- `@hudi-bot run azure` re-run
KnightChess opened a new pull request, #11578:
URL: https://github.com/apache/hudi/pull/11578
…x enabled
### Change Logs
- imporve `BucketIndexUtil` partitionIndex algorithm make the data be evenly
distributed.
- `BucketPartitionUtils` in spark use `BucketIndexUtil` method,
64 matches
Mail list logo