[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185997301 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestSparkConsistentBucketClustering.java: ## @@ -149,6 +154,59 @@ public void

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185996892 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -271,8 +327,114 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185996438 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -271,8 +327,114 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185996328 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -271,8 +327,114 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185995971 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -271,8 +327,114 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185995107 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -271,8 +327,114 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185994632 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -271,8 +327,114 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185994125 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185993420 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185993910 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185993420 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1185992732 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -513,7 +513,6 @@ private Stream getInstantsToArchive()

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-28 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1180044369 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-27 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1179114163 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1178579855 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1177521748 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1177385692 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1177385296 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-26 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1177384033 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -509,7 +509,15 @@ private Stream

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175964368 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/HoodieTimelineArchiver.java: ## @@ -441,6 +441,8 @@ private Stream getCommitInstantsToArchive()

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175229521 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +279,65 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175224217 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +279,62 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175223280 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +279,62 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175220178 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +279,62 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175216695 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +279,62 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175216695 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +279,62 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1175030462 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +278,46 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1174954973 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +278,46 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-24 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1174878383 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +278,46 @@ public Option

[GitHub] [hudi] SteNicholas commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-04-23 Thread via GitHub
SteNicholas commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1174772541 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/bucket/HoodieSparkConsistentBucketIndex.java: ## @@ -275,4 +278,46 @@ public Option