[jira] [Updated] (HUDI-2619) Make table services work with Dataset

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2619: - Description: Clustering, Compaction, Clean should also work with Dataset > Make table services work with

[jira] [Created] (HUDI-2619) Make table services work with Dataset

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2619: Summary: Make table services work with Dataset Key: HUDI-2619 URL: https://issues.apache.org/jira/browse/HUDI-2619 Project: Apache Hudi Issue Type: Sub-task

[jira] [Updated] (HUDI-2618) Implement operations other than upsert in SparkDataFrameWriteClient

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2618: - Story Points: 3 (was: 4) > Implement operations other than upsert in SparkDataFrameWriteClient >

[jira] [Created] (HUDI-2618) Implement operations other than upsert in SparkDataFrameWriteClient

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2618: Summary: Implement operations other than upsert in SparkDataFrameWriteClient Key: HUDI-2618 URL: https://issues.apache.org/jira/browse/HUDI-2618 Project: Apache Hudi

[jira] [Updated] (HUDI-2618) Implement operations other than upsert in SparkDataFrameWriteClient

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2618: - Story Points: 4 > Implement operations other than upsert in SparkDataFrameWriteClient > --

[jira] [Updated] (HUDI-2617) Implement HBase Index for Dataset

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2617: - Fix Version/s: 0.10.0 > Implement HBase Index for Dataset > -- > >

[jira] [Updated] (HUDI-2615) Decouple HoodieRecordPayload with Hoodie table, table services, and index

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2615: - Fix Version/s: 0.10.0 > Decouple HoodieRecordPayload with Hoodie table, table services, and index > --

[jira] [Updated] (HUDI-2531) [UMBRELLA] Support Dataset APIs in writer paths

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2531: - Fix Version/s: 0.10.0 > [UMBRELLA] Support Dataset APIs in writer paths >

[jira] [Updated] (HUDI-2616) Implement BloomIndex for Dataset

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2616: - Fix Version/s: 0.10.0 > Implement BloomIndex for Dataset > - > >

[GitHub] [hudi] danny0405 commented on a change in pull request #3599: [HUDI-2207] Support independent flink hudi clustering function

2021-10-24 Thread GitBox
danny0405 commented on a change in pull request #3599: URL: https://github.com/apache/hudi/pull/3599#discussion_r735249946 ## File path: hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java ## @@ -528,6 +528,66 @@ private FlinkOptions() { .defaultVal

[jira] [Created] (HUDI-2617) Implement HBase Index for Dataset

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2617: Summary: Implement HBase Index for Dataset Key: HUDI-2617 URL: https://issues.apache.org/jira/browse/HUDI-2617 Project: Apache Hudi Issue Type: Sub-task

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1430: - Description: End to end upsert operation, with proper functional tests coverage. > Implement SparkDataFra

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1430: - Story Points: 3 (was: 2) > Implement SparkDataFrameWriteClient with SimpleIndex > ---

[jira] [Updated] (HUDI-2616) Implement BloomIndex for Dataset

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2616: - Story Points: 2 > Implement BloomIndex for Dataset > - > >

[jira] [Created] (HUDI-2616) Implement BloomIndex for Dataset

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2616: Summary: Implement BloomIndex for Dataset Key: HUDI-2616 URL: https://issues.apache.org/jira/browse/HUDI-2616 Project: Apache Hudi Issue Type: Sub-task R

[jira] [Created] (HUDI-2615) Decouple HoodieRecordPayload with Hoodie table, table services, and index

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2615: Summary: Decouple HoodieRecordPayload with Hoodie table, table services, and index Key: HUDI-2615 URL: https://issues.apache.org/jira/browse/HUDI-2615 Project: Apache Hudi

[jira] [Updated] (HUDI-2531) [UMBRELLA] Support Dataset APIs in writer paths

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2531: - Priority: Blocker (was: Critical) > [UMBRELLA] Support Dataset APIs in writer paths > ---

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1430: - Story Points: 2 > Implement SparkDataFrameWriteClient with SimpleIndex > -

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1430: - Status: In Progress (was: Open) > Implement SparkDataFrameWriteClient with SimpleIndex >

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1430: - Parent: HUDI-2531 Issue Type: Sub-task (was: Improvement) > Implement SparkDataFrameWriteClient w

[jira] [Updated] (HUDI-1430) Implement SparkDataFrameWriteClient with SimpleIndex

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1430: - Summary: Implement SparkDataFrameWriteClient with SimpleIndex (was: Support Dataset write w/o conversion

[jira] [Updated] (HUDI-1970) Performance testing/certification of key SQL DMLs

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1970: - Status: In Progress (was: Open) > Performance testing/certification of key SQL DMLs > ---

[jira] [Commented] (HUDI-1970) Performance testing/certification of key SQL DMLs

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433587#comment-17433587 ] Raymond Xu commented on HUDI-1970: -- * 1B records (randomized values in the example trip m

[jira] [Updated] (HUDI-2287) Partition pruning not working on Hudi dataset

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2287: - Priority: Major (was: Blocker) > Partition pruning not working on Hudi dataset >

[jira] [Commented] (HUDI-2287) Partition pruning not working on Hudi dataset

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17433586#comment-17433586 ] Raymond Xu commented on HUDI-2287: -- [~rjkumr] it's likely caused by your `hoodie.table.pa

[GitHub] [hudi] hudi-bot edited a comment on pull request #3858: [MINOR] Fix README for hudi-kafka-connect

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3858: URL: https://github.com/apache/hudi/pull/3858#issuecomment-950564845 ## CI report: * f2ed52360c22cba5bbade224be9b3a6cec660d36 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot commented on pull request #3858: [MINOR] Fix README for hudi-kafka-connect

2021-10-24 Thread GitBox
hudi-bot commented on pull request #3858: URL: https://github.com/apache/hudi/pull/3858#issuecomment-950564845 ## CI report: * f2ed52360c22cba5bbade224be9b3a6cec660d36 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] hudi-bot edited a comment on pull request #3857: [WIP][HUDI-2332] Add clustering and compaction in Kafka Connect Sink

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3857: URL: https://github.com/apache/hudi/pull/3857#issuecomment-950560156 ## CI report: * 34cb663a0afb4362af0795384058378ef6ec130a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] yihua opened a new pull request #3858: [MINOR] Fix README for hudi-kafka-connect

2021-10-24 Thread GitBox
yihua opened a new pull request #3858: URL: https://github.com/apache/hudi/pull/3858 ## What is the purpose of the pull request This PR fixes the tutorial in README.md for hudi-kafka-connect. ## Brief change log - Edits to the commands so that they are runnable. #

[GitHub] [hudi] hudi-bot commented on pull request #3857: [WIP][HUDI-2332] Add clustering and compaction in Kafka Connect Sink

2021-10-24 Thread GitBox
hudi-bot commented on pull request #3857: URL: https://github.com/apache/hudi/pull/3857#issuecomment-950560156 ## CI report: * 34cb663a0afb4362af0795384058378ef6ec130a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[jira] [Updated] (HUDI-2332) Implement scheduling of compaction/ clustering for Kafka Connect

2021-10-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2332: - Labels: pull-request-available (was: ) > Implement scheduling of compaction/ clustering for Kafka

[GitHub] [hudi] yihua opened a new pull request #3857: [WIP][HUDI-2332] Add clustering and compaction in Kafka Connect Sink

2021-10-24 Thread GitBox
yihua opened a new pull request #3857: URL: https://github.com/apache/hudi/pull/3857 ## What is the purpose of the pull request This PR adds the functionality of clustering and compaction in Kafka Connect Sink for Hudi. ## Brief change log ## Verify this pull reques

[GitHub] [hudi] hudi-bot edited a comment on pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3802: URL: https://github.com/apache/hudi/pull/3802#issuecomment-943342747 ## CI report: * b63edfaca889ac6444b61a525cc9ee1065f610db Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[jira] [Updated] (HUDI-2077) Flaky test: TestHoodieDeltaStreamer

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2077: - Priority: Critical (was: Major) > Flaky test: TestHoodieDeltaStreamer > -

[jira] [Updated] (HUDI-1706) Test flakiness w/ multiwriter test

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1706: - Priority: Major (was: Blocker) > Test flakiness w/ multiwriter test > --

[jira] [Created] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-10-24 Thread vinoyang (Jira)
vinoyang created HUDI-2614: -- Summary: Remove duplicated hadoop-hdfs with tests classifier exists in bundles Key: HUDI-2614 URL: https://issues.apache.org/jira/browse/HUDI-2614 Project: Apache Hudi

[jira] [Updated] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-24 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-2600: --- Fix Version/s: 0.10.0 > Remove duplicated hadoop-common with tests classifier exists in bundles >

[jira] [Closed] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-24 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-2600. -- Resolution: Done 220bf6a7e6f5cdf0efbbbee9df6852a8b2288570 > Remove duplicated hadoop-common with tests classifi

[hudi] branch master updated: [HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles (#3847)

2021-10-24 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 220bf6a [HUDI-2600] Remove duplicated hadoop-co

[GitHub] [hudi] yanghua merged pull request #3847: [HUDI-2600] Remove duplicated hadoop-common with tests classifier exists in bundles

2021-10-24 Thread GitBox
yanghua merged pull request #3847: URL: https://github.com/apache/hudi/pull/3847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr.

[GitHub] [hudi] nsivabalan commented on a change in pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-24 Thread GitBox
nsivabalan commented on a change in pull request #3762: URL: https://github.com/apache/hudi/pull/3762#discussion_r735269130 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java ## @@ -200,8 +201,49 @@ protected BaseTableMetadata(HoodieEngineC

[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-24 Thread GitBox
nsivabalan commented on pull request #3762: URL: https://github.com/apache/hudi/pull/3762#issuecomment-950546155 @prashantwason : Can you review the patch please when you get time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [hudi] nsivabalan commented on a change in pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-24 Thread GitBox
nsivabalan commented on a change in pull request #3762: URL: https://github.com/apache/hudi/pull/3762#discussion_r735268438 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -120,65 +120,114 @@ private void initIfNeeded() {

[GitHub] [hudi] nsivabalan commented on pull request #3827: [HUDI-2573] Fixing double locking with multi-writers

2021-10-24 Thread GitBox
nsivabalan commented on pull request #3827: URL: https://github.com/apache/hudi/pull/3827#issuecomment-950539640 @manojpec : thanks for your inputs. I do like the idea of TransactionManager handling the locking depending on whether the lock acquisition is requested by same owner or diff. B

[GitHub] [hudi] nsivabalan merged pull request #3757: [HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader

2021-10-24 Thread GitBox
nsivabalan merged pull request #3757: URL: https://github.com/apache/hudi/pull/3757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubs

[hudi] branch master updated: [HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader (#3757)

2021-10-24 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 1bb0532 [HUDI-2005] Avoiding direct fs calls i

[GitHub] [hudi] nsivabalan commented on a change in pull request #3757: [HUDI-2005] Avoiding direct fs calls in HoodieLogFileReader

2021-10-24 Thread GitBox
nsivabalan commented on a change in pull request #3757: URL: https://github.com/apache/hudi/pull/3757#discussion_r735263839 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeSplit.java ## @@ -41,6 +42,8 @@ */ List getDeltaLogPaths(); Re

[jira] [Assigned] (HUDI-2613) Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus

2021-10-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-2613: - Assignee: sivabalan narayanan > Fix usages of RealtimeSplit to use the new getDel

[jira] [Updated] (HUDI-2613) Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus

2021-10-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2613: -- Parent: HUDI-1292 Issue Type: Sub-task (was: Improvement) > Fix usages of Realt

[jira] [Updated] (HUDI-2613) Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus

2021-10-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2613: -- Fix Version/s: 0.10.0 > Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus

[jira] [Created] (HUDI-2613) Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus

2021-10-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2613: - Summary: Fix usages of RealtimeSplit to use the new getDeltaLogFileStatus Key: HUDI-2613 URL: https://issues.apache.org/jira/browse/HUDI-2613 Project: Apach

[GitHub] [hudi] hudi-bot edited a comment on pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3802: URL: https://github.com/apache/hudi/pull/3802#issuecomment-943342747 ## CI report: * e906d363c06635bbcc7c69db5fcc4ff0f0f2d919 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3802: URL: https://github.com/apache/hudi/pull/3802#issuecomment-943342747 ## CI report: * e906d363c06635bbcc7c69db5fcc4ff0f0f2d919 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3813: URL: https://github.com/apache/hudi/pull/3813#issuecomment-944948402 ## CI report: * 7a7ee072ae225fe015b73545ac8d50acc5746ea7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[hudi] branch master updated: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter (#3849)

2021-10-24 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d856037 [HUDI-2077] Fix TestHoodieDeltaStreamer

[GitHub] [hudi] xushiyan merged pull request #3849: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter

2021-10-24 Thread GitBox
xushiyan merged pull request #3849: URL: https://github.com/apache/hudi/pull/3849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr

[GitHub] [hudi] xushiyan commented on pull request #3849: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter

2021-10-24 Thread GitBox
xushiyan commented on pull request #3849: URL: https://github.com/apache/hudi/pull/3849#issuecomment-950511022 Build passed https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=2820&view=results -- This is an automated message from the Apache Git Service. To res

[GitHub] [hudi] Cherry-Puppy removed a comment on issue #3680: [SUPPORT]Failed to sync data to hive-3.1.2 by flink-sql

2021-10-24 Thread GitBox
Cherry-Puppy removed a comment on issue #3680: URL: https://github.com/apache/hudi/issues/3680#issuecomment-950498398 I also encountered this problem. I still can't find this class after changing the hive version. But there is this class in the jar package. -- This is an automated messag

[GitHub] [hudi] Cherry-Puppy commented on issue #3680: [SUPPORT]Failed to sync data to hive-3.1.2 by flink-sql

2021-10-24 Thread GitBox
Cherry-Puppy commented on issue #3680: URL: https://github.com/apache/hudi/issues/3680#issuecomment-950503446 @danny0405 I also encountered this problem. I still can't find this class after changing the hive version. But there is this class in the jar package. -- This is an automated mes

[GitHub] [hudi] Cherry-Puppy commented on issue #3680: [SUPPORT]Failed to sync data to hive-3.1.2 by flink-sql

2021-10-24 Thread GitBox
Cherry-Puppy commented on issue #3680: URL: https://github.com/apache/hudi/issues/3680#issuecomment-950498398 I also encountered this problem. I still can't find this class after changing the hive version. But there is this class in the jar package. -- This is an automated message from t

[GitHub] [hudi] hudi-bot edited a comment on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3813: URL: https://github.com/apache/hudi/pull/3813#issuecomment-944948402 ## CI report: * 822dbe03dc77531858ffd83ebeb91f210f4e7851 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3813: [HUDI-2563][hudi-client] Refactor CompactionTriggerStrategy.

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3813: URL: https://github.com/apache/hudi/pull/3813#issuecomment-944948402 ## CI report: * 822dbe03dc77531858ffd83ebeb91f210f4e7851 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 133379deca564ca42f10a1f3e59bb4aa17d80964 UNKNOWN * e555754a4ea179e5251cd7bbff7e8d20c02ef7c8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-

[GitHub] [hudi] boneanxs opened a new issue #3856: [SUPPORT] Maybe should cache baseDir in nonHoodiePathCache in HoodieROTablePathFilter?

2021-10-24 Thread GitBox
boneanxs opened a new issue #3856: URL: https://github.com/apache/hudi/issues/3856 For a non hoodie table, with table path: `hdfs://test/warehouse/db/table`, 3 partition columns(p1, p2, p3), for a specific partition, like(p1=A, p2=B, p3=C), the path should be `hdfs://test/warehouse/db/tabl

[GitHub] [hudi] dongkelun commented on a change in pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-10-24 Thread GitBox
dongkelun commented on a change in pull request #3700: URL: https://github.com/apache/hudi/pull/3700#discussion_r735231887 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala ## @@ -163,15 +163,15 @@ case class

[jira] [Created] (HUDI-2612) No need to define primary key for flink insert operation

2021-10-24 Thread Danny Chen (Jira)
Danny Chen created HUDI-2612: Summary: No need to define primary key for flink insert operation Key: HUDI-2612 URL: https://issues.apache.org/jira/browse/HUDI-2612 Project: Apache Hudi Issue Type

[GitHub] [hudi] vinothchandar commented on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-10-24 Thread GitBox
vinothchandar commented on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-950475158 Thanks for your patience. Definitely on it. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [hudi] YannByron commented on a change in pull request #3700: [HUDI-2471] Add support ignoring case when column name matches in merge into

2021-10-24 Thread GitBox
YannByron commented on a change in pull request #3700: URL: https://github.com/apache/hudi/pull/3700#discussion_r735220808 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala ## @@ -163,15 +163,15 @@ case class

[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 133379deca564ca42f10a1f3e59bb4aa17d80964 UNKNOWN * 8236ece4816e100af13702bf92fdddf9c5e14eaf Azure: [FAILURE](https://dev.azure.com/apache-hudi-

[GitHub] [hudi] xiarixiaoyao commented on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-10-24 Thread GitBox
xiarixiaoyao commented on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-950471560 @vinothchandar already rebase the code. could you help me review this code, thanks. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] hudi-bot edited a comment on pull request #3330: [HUDI-2101][RFC-28]support z-order for hudi

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3330: URL: https://github.com/apache/hudi/pull/3330#issuecomment-885350571 ## CI report: * 133379deca564ca42f10a1f3e59bb4aa17d80964 UNKNOWN * 8236ece4816e100af13702bf92fdddf9c5e14eaf Azure: [FAILURE](https://dev.azure.com/apache-hudi-

[GitHub] [hudi] hudi-bot edited a comment on pull request #3849: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3849: URL: https://github.com/apache/hudi/pull/3849#issuecomment-950068934 ## CI report: * f623c7545b41a70eb607d530428536567e70fb7a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] xushiyan commented on issue #3854: [SUPPORT] Lower performance using 0.9.0 vs 0.8.0

2021-10-24 Thread GitBox
xushiyan commented on issue #3854: URL: https://github.com/apache/hudi/issues/3854#issuecomment-950455798 @Limess thanks for providing benchmarks! > bulk inserts are slightly faster with Hudi 0.9.0 This is most likely due to row writer enabled by default in 0.9.0 https://hudi

[GitHub] [hudi] hudi-bot edited a comment on pull request #3849: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3849: URL: https://github.com/apache/hudi/pull/3849#issuecomment-950068934 ## CI report: * 16b061c5fa2b1d77755913cb6bda1025c4baf526 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] hudi-bot edited a comment on pull request #3849: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter

2021-10-24 Thread GitBox
hudi-bot edited a comment on pull request #3849: URL: https://github.com/apache/hudi/pull/3849#issuecomment-950068934 ## CI report: * 16b061c5fa2b1d77755913cb6bda1025c4baf526 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/res

[GitHub] [hudi] xushiyan closed issue #3845: [SUPPORT]`if not exists` doesn't work on create table in spark-sql

2021-10-24 Thread GitBox
xushiyan closed issue #3845: URL: https://github.com/apache/hudi/issues/3845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[GitHub] [hudi] xushiyan commented on issue #3845: [SUPPORT]`if not exists` doesn't work on create table in spark-sql

2021-10-24 Thread GitBox
xushiyan commented on issue #3845: URL: https://github.com/apache/hudi/issues/3845#issuecomment-950442738 @mutoulbj @BenjMaq Thanks for raising this! It does make sense to print a message indicating table exists instead of errorring. Filing a JIRA and please feel free to take it if you're

[jira] [Created] (HUDI-2611) `create table if not exists` should print message instead of throwing error

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2611: Summary: `create table if not exists` should print message instead of throwing error Key: HUDI-2611 URL: https://issues.apache.org/jira/browse/HUDI-2611 Project: Apache Hudi

[GitHub] [hudi] xushiyan closed issue #3662: [SUPPORT] Error on the spark version in the desc information of the hudi CTAS Table

2021-10-24 Thread GitBox
xushiyan closed issue #3662: URL: https://github.com/apache/hudi/issues/3662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[GitHub] [hudi] xushiyan commented on issue #3662: [SUPPORT] Error on the spark version in the desc information of the hudi CTAS Table

2021-10-24 Thread GitBox
xushiyan commented on issue #3662: URL: https://github.com/apache/hudi/issues/3662#issuecomment-950438362 @kelvin-qin thanks for reproducing this! i see it's not the right spark version info if CTAS from a hudi table. the version info not propagated correctly. I can also reproduce it; It'd

[jira] [Created] (HUDI-2610) Fix Spark version info for hudi table CTAS from another hudi table

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2610: Summary: Fix Spark version info for hudi table CTAS from another hudi table Key: HUDI-2610 URL: https://issues.apache.org/jira/browse/HUDI-2610 Project: Apache Hudi

[GitHub] [hudi] nsivabalan commented on a change in pull request #3849: [HUDI-2077] Fix TestHoodieDeltaStreamerWithMultiWriter

2021-10-24 Thread GitBox
nsivabalan commented on a change in pull request #3849: URL: https://github.com/apache/hudi/pull/3849#discussion_r735196225 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamerWithMultiWriter.java ## @@ -254,6 +254,16 @@ priva

[GitHub] [hudi] xushiyan closed issue #3392: [SUPPORT] Compile hudi master with hive version 2.1.1 error

2021-10-24 Thread GitBox
xushiyan closed issue #3392: URL: https://github.com/apache/hudi/issues/3392 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[GitHub] [hudi] xushiyan commented on issue #3392: [SUPPORT] Compile hudi master with hive version 2.1.1 error

2021-10-24 Thread GitBox
xushiyan commented on issue #3392: URL: https://github.com/apache/hudi/issues/3392#issuecomment-950421639 Close due to inactive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [hudi] xushiyan commented on issue #3760: [SUPPORT] Pushing hoodie metrics to prometheus having error

2021-10-24 Thread GitBox
xushiyan commented on issue #3760: URL: https://github.com/apache/hudi/issues/3760#issuecomment-950421068 > I think spark never try to write to prometheus, even if I put a wrong address, no error. @rubenssoto can you share your settings? @liujinhui1994 could you give any suggestions

[GitHub] [hudi] xushiyan commented on issue #3760: [SUPPORT] Pushing hoodie metrics to prometheus having error

2021-10-24 Thread GitBox
xushiyan commented on issue #3760: URL: https://github.com/apache/hudi/issues/3760#issuecomment-950420645 @data-storyteller @rubenssoto can you check out this guide prepared by @nsivabalan (to be merged to website) and see the instructions help? https://github.com/apache/hudi/commit/959bd6

[GitHub] [hudi] xushiyan closed issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-10-24 Thread GitBox
xushiyan closed issue #3676: URL: https://github.com/apache/hudi/issues/3676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[GitHub] [hudi] xushiyan commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-10-24 Thread GitBox
xushiyan commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-950417563 @nsivabalan i also filed https://issues.apache.org/jira/browse/HUDI-2609 to make docs clearer on this. -- This is an automated message from the Apache Git Service. To respond to the

[jira] [Updated] (HUDI-2609) Clarify small file configs in config page

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2609: - Labels: user-support-issues (was: ) > Clarify small file configs in config page > ---

[jira] [Created] (HUDI-2609) Clarify small file configs in config page

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2609: Summary: Clarify small file configs in config page Key: HUDI-2609 URL: https://issues.apache.org/jira/browse/HUDI-2609 Project: Apache Hudi Issue Type: Sub-task

[jira] [Assigned] (HUDI-2607) Reorganize Hudi docs

2021-10-24 Thread Rajesh Mahindra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Mahindra reassigned HUDI-2607: - Assignee: Kyle Weller > Reorganize Hudi docs > > > K

[GitHub] [hudi] xushiyan commented on issue #3191: [SUPPORT]client spark-submit cmd error:Caused by: java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.DataSourceUtils$.PARTITIONI

2021-10-24 Thread GitBox
xushiyan commented on issue #3191: URL: https://github.com/apache/hudi/issues/3191#issuecomment-950416601 @xer001 `PARTITIONING_COLUMNS_KEY` is **not** added in spark 2.4.0 see https://jar-download.com/artifacts/org.apache.spark/spark-sql_2.11/2.4.0/source-code/org/apache/spark/sql/executio

[GitHub] [hudi] xushiyan edited a comment on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

2021-10-24 Thread GitBox
xushiyan edited a comment on issue #3835: URL: https://github.com/apache/hudi/issues/3835#issuecomment-950410484 @shivabodepudi I see. The problem is you're using Json schema. The schema provider `org.apache.hudi.schema.SchemaProvider` defines only avro schema to be provided. You could ext

[GitHub] [hudi] xushiyan edited a comment on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

2021-10-24 Thread GitBox
xushiyan edited a comment on issue #3835: URL: https://github.com/apache/hudi/issues/3835#issuecomment-950410484 @shivabodepudi I see. The problem is only Avro schema is supported and you're using Json schema. The schema provider `org.apache.hudi.schema.SchemaProvider` defines only avro sc

[GitHub] [hudi] xushiyan commented on issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

2021-10-24 Thread GitBox
xushiyan commented on issue #3835: URL: https://github.com/apache/hudi/issues/3835#issuecomment-950410484 @shivabodepudi I see. The problem is only Avro schema is supported and you're using Json schema. The schema provider `org.apache.hudi.schema.SchemaProvider` defines only avro schema to

[GitHub] [hudi] xushiyan closed issue #3835: Hudi deltastreamer using avro schema parser when using jsonKafkaSource

2021-10-24 Thread GitBox
xushiyan closed issue #3835: URL: https://github.com/apache/hudi/issues/3835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@h

[jira] [Updated] (HUDI-2608) Support JSON schema in schema registry provider

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2608: - Description: To work with JSON kafka source.   Original issue https://github.com/apache/hudi/issues/383

[jira] [Updated] (HUDI-2608) Support JSON schema in schema registry provider

2021-10-24 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2608: - Labels: sev:normal user-support-issues (was: ) > Support JSON schema in schema registry provider > --

[jira] [Created] (HUDI-2608) Support JSON schema in schema registry provider

2021-10-24 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2608: Summary: Support JSON schema in schema registry provider Key: HUDI-2608 URL: https://issues.apache.org/jira/browse/HUDI-2608 Project: Apache Hudi Issue Type: New Fea

[hudi] branch asf-site updated: [DOCS] Update azure_hoodie.md and docker_demo.md of cn doc (#3851)

2021-10-24 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 4814dff [DOCS] Update azure_hoodie.md and d

  1   2   >