[GitHub] [hudi] hudi-bot commented on pull request #6694: [DO NOT MERGE][HUDI-4855] Add missing table configs for bootstrap in Deltastreamer

2022-09-16 Thread GitBox
hudi-bot commented on PR #6694: URL: https://github.com/apache/hudi/pull/6694#issuecomment-1249094331 ## CI report: * 5a6eed936fc08b943370db12c258ea6e75430912 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6693: [HUDI-4856] Missing option for HoodieCatalogFactory

2022-09-16 Thread GitBox
hudi-bot commented on PR #6693: URL: https://github.com/apache/hudi/pull/6693#issuecomment-1249094279 ## CI report: * d543dcbb0210ad1e798af374efe7bae79065bbde Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6630: [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo…

2022-09-16 Thread GitBox
hudi-bot commented on PR #6630: URL: https://github.com/apache/hudi/pull/6630#issuecomment-1249094037 ## CI report: * c6fe58f992656d26e60f24e2b5791613f55e5bd3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6516: [HUDI-4729] Fix fq can not be queried in pending compaction when query ro table with spark

2022-09-16 Thread GitBox
hudi-bot commented on PR #6516: URL: https://github.com/apache/hudi/pull/6516#issuecomment-1249093781 ## CI report: * 8b06e2b181eb0d913a3d9a465e06082cd040bfec Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6630: [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo…

2022-09-16 Thread GitBox
hudi-bot commented on PR #6630: URL: https://github.com/apache/hudi/pull/6630#issuecomment-1249087756 ## CI report: * c6fe58f992656d26e60f24e2b5791613f55e5bd3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6694: [DO NOT MERGE][HUDI-4855] Add missing table configs for bootstrap in Deltastreamer

2022-09-16 Thread GitBox
hudi-bot commented on PR #6694: URL: https://github.com/apache/hudi/pull/6694#issuecomment-1249088011 ## CI report: * 5a6eed936fc08b943370db12c258ea6e75430912 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #6693: [HUDI-4856] Missing option for HoodieCatalogFactory

2022-09-16 Thread GitBox
hudi-bot commented on PR #6693: URL: https://github.com/apache/hudi/pull/6693#issuecomment-1249087981 ## CI report: * d543dcbb0210ad1e798af374efe7bae79065bbde UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #6561: [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering

2022-09-16 Thread GitBox
hudi-bot commented on PR #6561: URL: https://github.com/apache/hudi/pull/6561#issuecomment-1249087606 ## CI report: * 402f9171361b89e8a0483ea9afd4f427d495efb8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6516: [HUDI-4729] Fix fq can not be queried in pending compaction when query ro table with spark

2022-09-16 Thread GitBox
hudi-bot commented on PR #6516: URL: https://github.com/apache/hudi/pull/6516#issuecomment-1249087486 ## CI report: * 8b06e2b181eb0d913a3d9a465e06082cd040bfec Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6561: [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering

2022-09-16 Thread GitBox
hudi-bot commented on PR #6561: URL: https://github.com/apache/hudi/pull/6561#issuecomment-1249081698 ## CI report: * 402f9171361b89e8a0483ea9afd4f427d495efb8 Azure:

[jira] [Commented] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index

2022-09-16 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605706#comment-17605706 ] xi chaomin commented on HUDI-3983: -- I tried the method mentioned in

[GitHub] [hudi] microbearz commented on a diff in pull request #6516: [HUDI-4729] Fix fq can not be queried in pending compaction when query ro table with spark

2022-09-16 Thread GitBox
microbearz commented on code in PR #6516: URL: https://github.com/apache/hudi/pull/6516#discussion_r972758143 ## hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java: ## @@ -665,13 +665,25 @@ public final Stream

[jira] [Updated] (HUDI-4858) HoodieSparkSqlWriter can't work concurrently

2022-09-16 Thread cadl (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cadl updated HUDI-4858: --- Description: When I am doing dataframe.write.format("org.apache.hudi") with ForkJoinPool,  I got a NPE at

[jira] [Created] (HUDI-4858) HoodieSparkSqlWriter can't work concurrently

2022-09-16 Thread cadl (Jira)
cadl created HUDI-4858: -- Summary: HoodieSparkSqlWriter can't work concurrently Key: HUDI-4858 URL: https://issues.apache.org/jira/browse/HUDI-4858 Project: Apache Hudi Issue Type: Bug

[jira] [Updated] (HUDI-4855) Bootstrap table from Deltastreamer cannot be read in Spark

2022-09-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4855: - Labels: pull-request-available (was: ) > Bootstrap table from Deltastreamer cannot be read in

[GitHub] [hudi] yihua opened a new pull request, #6694: [DO NOT MERGE][HUDI-4855] Add missing table configs for bootstrap in Deltastreamer

2022-09-16 Thread GitBox
yihua opened a new pull request, #6694: URL: https://github.com/apache/hudi/pull/6694 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[jira] [Commented] (HUDI-3983) ClassNotFoundException when using hudi-spark-bundle to write table with hbase index

2022-09-16 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605704#comment-17605704 ] xi chaomin commented on HUDI-3983: -- [~codope] Thanks for your help, ClassNotFoundException is not the

[GitHub] [hudi] danny0405 commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-16 Thread GitBox
danny0405 commented on code in PR #6550: URL: https://github.com/apache/hudi/pull/6550#discussion_r972725757 ## pom.xml: ## @@ -612,6 +624,19 @@ + + +org.scala-lang +scala-library +${scala.version} + + + +

[GitHub] [hudi] TJX2014 commented on a diff in pull request #6630: [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo…

2022-09-16 Thread GitBox
TJX2014 commented on code in PR #6630: URL: https://github.com/apache/hudi/pull/6630#discussion_r972723942 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java: ## @@ -152,27 +151,21 @@ public Path withInserts(String partition,

[jira] [Updated] (HUDI-4856) Missing option for HoodieCatalogFactory

2022-09-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4856: - Labels: pull-request-available (was: ) > Missing option for HoodieCatalogFactory >

[GitHub] [hudi] danny0405 opened a new pull request, #6693: [HUDI-4856] Missing option for HoodieCatalogFactory

2022-09-16 Thread GitBox
danny0405 opened a new pull request, #6693: URL: https://github.com/apache/hudi/pull/6693 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6561: [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering

2022-09-16 Thread GitBox
nsivabalan commented on code in PR #6561: URL: https://github.com/apache/hudi/pull/6561#discussion_r972714037 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java: ## @@ -249,7 +249,7 @@ protected HoodieWriteMetadata>

[jira] [Assigned] (HUDI-4855) Bootstrap table from Deltastreamer cannot be read in Spark

2022-09-16 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-4855: --- Assignee: Ethan Guo > Bootstrap table from Deltastreamer cannot be read in Spark >

[jira] [Updated] (HUDI-4855) Bootstrap table from Deltastreamer cannot be read in Spark

2022-09-16 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4855: Status: In Progress (was: Open) > Bootstrap table from Deltastreamer cannot be read in Spark >

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6561: [HUDI-4760] Fixing repeated trigger of data file creations w/ clustering

2022-09-16 Thread GitBox
nsivabalan commented on code in PR #6561: URL: https://github.com/apache/hudi/pull/6561#discussion_r972708293 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java: ## @@ -356,6 +360,16 @@ public HoodieWriteMetadata> cluster(String

[GitHub] [hudi] hudi-bot commented on pull request #6669: [HUDI-4841] Fix BlockLocation array sorting idempotency issue

2022-09-16 Thread GitBox
hudi-bot commented on PR #6669: URL: https://github.com/apache/hudi/pull/6669#issuecomment-1249018925 ## CI report: * 5ccbb91fecf43acc0ef6326e83fded2a58039d86 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] Support partial update payload

2022-09-16 Thread GitBox
hudi-bot commented on PR #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1249016966 ## CI report: * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN * 4ef7b451c3dd795906f3f68571256baeb330a59f UNKNOWN * 6aeb3d0d8f09aeab2a5766cf9d25ecb30537 UNKNOWN *

[GitHub] [hudi] TJX2014 commented on a diff in pull request #6630: [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo…

2022-09-16 Thread GitBox
TJX2014 commented on code in PR #6630: URL: https://github.com/apache/hudi/pull/6630#discussion_r972698015 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bucket/HoodieSimpleBucketIndex.java: ## @@ -52,10 +54,20 @@ private Map

[GitHub] [hudi] boneanxs commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-09-16 Thread GitBox
boneanxs commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r972697125 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/SparkSortAndSizeExecutionStrategy.java: ## @@ -52,6 +55,27 @@ public

[jira] [Updated] (HUDI-4857) Replace DataFrame with HoodieData in Spark side

2022-09-16 Thread Hui An (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui An updated HUDI-4857: - Summary: Replace DataFrame with HoodieData in Spark side (was: Replace DataFrame with HoodieData) > Replace

[jira] [Created] (HUDI-4857) Replace DataFrame with HoodieData

2022-09-16 Thread Hui An (Jira)
Hui An created HUDI-4857: Summary: Replace DataFrame with HoodieData Key: HUDI-4857 URL: https://issues.apache.org/jira/browse/HUDI-4857 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] hudi-bot commented on pull request #6669: [HUDI-4841] Fix BlockLocation array sorting idempotency issue

2022-09-16 Thread GitBox
hudi-bot commented on PR #6669: URL: https://github.com/apache/hudi/pull/6669#issuecomment-1249014117 ## CI report: * 5ccbb91fecf43acc0ef6326e83fded2a58039d86 Azure:

[jira] [Assigned] (HUDI-4857) Replace DataFrame with HoodieData

2022-09-16 Thread Hui An (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui An reassigned HUDI-4857: Assignee: Hui An > Replace DataFrame with HoodieData > - > >

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-09-16 Thread GitBox
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1249013513 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * 2f9e8ca8d6893e973883dadcab117597ee6badd3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #4676: [HUDI-3304] Support partial update payload

2022-09-16 Thread GitBox
hudi-bot commented on PR #4676: URL: https://github.com/apache/hudi/pull/4676#issuecomment-1249011935 ## CI report: * 5944f5cbe9ce73fe6b7e27a0d381eaeb80dead38 UNKNOWN * 4ef7b451c3dd795906f3f68571256baeb330a59f UNKNOWN * 6aeb3d0d8f09aeab2a5766cf9d25ecb30537 UNKNOWN *

[GitHub] [hudi] boneanxs commented on a diff in pull request #6046: [HUDI-4363] Support Clustering row writer to improve performance

2022-09-16 Thread GitBox
boneanxs commented on code in PR #6046: URL: https://github.com/apache/hudi/pull/6046#discussion_r972551309 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/MultipleSparkJobExecutionStrategy.java: ## @@ -98,10 +106,18 @@ public

[GitHub] [hudi] hudi-bot commented on pull request #6079: [HUDI-3287] Remove hudi-spark dependencies from hudi-kafka-connect-bundle

2022-09-16 Thread GitBox
hudi-bot commented on PR #6079: URL: https://github.com/apache/hudi/pull/6079#issuecomment-1249007815 ## CI report: * b5c4f453ed1d504b396bdf536777d393947b21d3 Azure:

[GitHub] [hudi] eshu opened a new issue, #6692: [SUPPORT] ClassCastException after migration to Hudi 0.12.0

2022-09-16 Thread GitBox
eshu opened a new issue, #6692: URL: https://github.com/apache/hudi/issues/6692 The class case exception was thrown when writing data **Environment Description** * Hudi version : 0.12.0 * Spark version : 3.1.1 * Storage (HDFS/S3/GCS..) : S3 * Running on

[jira] [Updated] (HUDI-4856) Missing option for HoodieCatalogFactory

2022-09-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4856: - Summary: Missing option for HoodieCatalogFactory (was: Missing option for HoodieTableFactory) > Missing

[jira] [Created] (HUDI-4856) Missing option for HoodieTableFactory

2022-09-16 Thread Danny Chen (Jira)
Danny Chen created HUDI-4856: Summary: Missing option for HoodieTableFactory Key: HUDI-4856 URL: https://issues.apache.org/jira/browse/HUDI-4856 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] prasannarajaperumal commented on a diff in pull request #5370: [RFC-52][HUDI-3907] RFC for Introduce Secondary Index to Improve Hudi Query Performance

2022-09-16 Thread GitBox
prasannarajaperumal commented on code in PR #5370: URL: https://github.com/apache/hudi/pull/5370#discussion_r972656384 ## rfc/rfc-52/rfc-52.md: ## @@ -0,0 +1,284 @@ + +# RFC-52: Introduce Secondary Index to Improve HUDI Query Performance + +## Proposers + +- @huberylee +-

[GitHub] [hudi] voonhous commented on a diff in pull request #6669: [HUDI-4841] Fix BlockLocation array sorting idempotency issue

2022-09-16 Thread GitBox
voonhous commented on code in PR #6669: URL: https://github.com/apache/hudi/pull/6669#discussion_r972661367 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java: ## @@ -214,13 +213,7 @@ public FileInputSplit[]

[jira] [Updated] (HUDI-4841) Fix BlockLocation array sorting idempotency issue

2022-09-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4841: - Summary: Fix BlockLocation array sorting idempotency issue (was: Flink read issue; BlockLocations not

[jira] [Updated] (HUDI-4841) Flink read issue; BlockLocations not sorted properly; Sort implementation is not idempotent

2022-09-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4841: - Fix Version/s: 0.12.1 > Flink read issue; BlockLocations not sorted properly; Sort implementation is >

[GitHub] [hudi] voonhous commented on a diff in pull request #6669: [HUDI-4841] Fix BlockLocation array sorting idempotency issue

2022-09-16 Thread GitBox
voonhous commented on code in PR #6669: URL: https://github.com/apache/hudi/pull/6669#discussion_r972658947 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java: ## @@ -214,13 +213,7 @@ public FileInputSplit[]

[GitHub] [hudi] danny0405 commented on a diff in pull request #6669: [HUDI-4841] Fix BlockLocation array sorting idempotency issue

2022-09-16 Thread GitBox
danny0405 commented on code in PR #6669: URL: https://github.com/apache/hudi/pull/6669#discussion_r972658464 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java: ## @@ -214,13 +213,7 @@ public FileInputSplit[]

[GitHub] [hudi] hudi-bot commented on pull request #6079: [HUDI-3287] Remove hudi-spark dependencies from hudi-kafka-connect-bundle

2022-09-16 Thread GitBox
hudi-bot commented on PR #6079: URL: https://github.com/apache/hudi/pull/6079#issuecomment-1248965368 ## CI report: * b5c4f453ed1d504b396bdf536777d393947b21d3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6358: [HUDI-4588][HUDI-4472] Fixing `HoodieParquetReader` to properly specify projected schema when reading Parquet file

2022-09-16 Thread GitBox
hudi-bot commented on PR #6358: URL: https://github.com/apache/hudi/pull/6358#issuecomment-1248955402 ## CI report: * 288d166c49602a4593b1e97763a467811903737d UNKNOWN * c4b6bb8dc7a4ddce5f729e5a49ac10aad25e8931 Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #6630: [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo…

2022-09-16 Thread GitBox
danny0405 commented on code in PR #6630: URL: https://github.com/apache/hudi/pull/6630#discussion_r971870133 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/testutils/HoodieWriteableTestTable.java: ## @@ -152,27 +151,21 @@ public Path withInserts(String

<    1   2   3