[jira] [Updated] (HUDI-6186) Fix close() in InProcessLockProvider

2023-05-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6186: Priority: Blocker (was: Major) > Fix close() in InProcessLockProvider > ---

[GitHub] [hudi] nsivabalan commented on pull request #8589: [HUDI-6147] Deltastreamer finish failed compaction before ingestion

2023-05-06 Thread via GitHub
nsivabalan commented on PR #8589: URL: https://github.com/apache/hudi/pull/8589#issuecomment-1537334544 rebased w/ latest master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[jira] [Updated] (HUDI-6186) Fix close() in InProcessLockProvider

2023-05-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6186: Fix Version/s: 0.13.1 > Fix close() in InProcessLockProvider > > >

[jira] [Updated] (HUDI-6186) Fix close() in InProcessLockProvider

2023-05-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6186: Component/s: multi-writer > Fix close() in InProcessLockProvider > > >

[jira] [Created] (HUDI-6186) Fix close() in InProcessLockProvider

2023-05-06 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6186: --- Summary: Fix close() in InProcessLockProvider Key: HUDI-6186 URL: https://issues.apache.org/jira/browse/HUDI-6186 Project: Apache Hudi Issue Type: Bug Repo

[jira] [Assigned] (HUDI-6186) Fix close() in InProcessLockProvider

2023-05-06 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6186: --- Assignee: Ethan Guo > Fix close() in InProcessLockProvider > > >

[GitHub] [hudi] nsivabalan closed pull request #8656: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
nsivabalan closed pull request #8656: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer URL: https://github.com/apache/hudi/pull/8656 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
nsivabalan commented on code in PR #8107: URL: https://github.com/apache/hudi/pull/8107#discussion_r1186790831 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestHoodieOptionConfig.scala: ## @@ -109,16 +109,6 @@ class TestHoodieOptionConfig extends

[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537323167 ## CI report: * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] c-f-cooper commented on issue #8652: [SUPPORT]Parquet is not a valid Parquet File

2023-05-06 Thread via GitHub
c-f-cooper commented on issue #8652: URL: https://github.com/apache/hudi/issues/8652#issuecomment-1537313715 we use default config of `flink_state` index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8557: URL: https://github.com/apache/hudi/pull/8557#issuecomment-1537312173 ## CI report: * 2679efffa32d3f41bc7d80b9377b8267e5bdc3db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537311424 ## CI report: * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] c-f-cooper commented on issue #8651: [SUPPORT]How to resolve small file?

2023-05-06 Thread via GitHub
c-f-cooper commented on issue #8651: URL: https://github.com/apache/hudi/issues/8651#issuecomment-1537307015 > > Do you do not enable the async clustering right? We have inline clustering, async clustering, and offline clustering, which one are you using? > > we use async clustering,w

[GitHub] [hudi] c-f-cooper commented on issue #8651: [SUPPORT]How to resolve small file?

2023-05-06 Thread via GitHub
c-f-cooper commented on issue #8651: URL: https://github.com/apache/hudi/issues/8651#issuecomment-1537306637 > Do you do not enable the async clustering right? We have inline clustering, async clustering, and offline clustering, which one are you using? we use async clustering,we use

[GitHub] [hudi] hudi-bot commented on pull request #8596: [MINOR] Use try with resource to close stream

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1537302427 ## CI report: * 5ee1f5c3af94d920cfb1e186b2896dfe25533c2a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8557: URL: https://github.com/apache/hudi/pull/8557#issuecomment-1537302369 ## CI report: * 2679efffa32d3f41bc7d80b9377b8267e5bdc3db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537302254 ## CI report: * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] danny0405 commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
danny0405 commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1537297965 And this one: `org.apache.hudi.TestDataSourceDefaults` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [hudi] danny0405 commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
danny0405 commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1537296681 Only one test failure: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=16905&view=logs&j=b1544eb9-7ff1-5db9-0187-3e05abf459bc&t=e0ae894b-41c9-5f4b-7ed2-bdf524

[GitHub] [hudi] yihua commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
yihua commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537295895 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [hudi] danny0405 commented on issue #8651: [SUPPORT]How to resolve small file?

2023-05-06 Thread via GitHub
danny0405 commented on issue #8651: URL: https://github.com/apache/hudi/issues/8651#issuecomment-1537295105 Do you do not enable the async clustering right? We have inline clustering, async clustering, and offline clustering, which one are you using? -- This is an automated message fro

[GitHub] [hudi] danny0405 commented on issue #8652: [SUPPORT]Parquet is not a valid Parquet File

2023-05-06 Thread via GitHub
danny0405 commented on issue #8652: URL: https://github.com/apache/hudi/issues/8652#issuecomment-1537295002 Yeah, if you uses bucket index, delete the file directly is also feasible if there is no streaming read for the file. -- This is an automated message from the Apache Git Service. To

[jira] [Closed] (HUDI-6120) fetchAllLogsMergedFileSlice will read basefile which it does not expect

2023-05-06 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6120. Resolution: Fixed Fixed via master branch: 20938c30b168d63cf4e520c6b4e1d7b930bed1ab > fetchAllLogsMergedFil

[hudi] branch master updated: [HUDI-6120] Add some notion for fetchAllLogsMergedFileSlice (#8529)

2023-05-06 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 20938c30b16 [HUDI-6120] Add some notion for fe

[jira] [Closed] (HUDI-6095) Refactor the judgment condition of WorkloadProfile

2023-05-06 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6095. Resolution: Fixed Fixed via master branch: ec9cb3c44646d76e2a6440f61d5d453822fbc829 > Refactor the judgment

[jira] [Updated] (HUDI-6095) Refactor the judgment condition of WorkloadProfile

2023-05-06 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6095: - Fix Version/s: 0.14.0 > Refactor the judgment condition of WorkloadProfile > -

[GitHub] [hudi] danny0405 merged pull request #8529: [HUDI-6120] Filter base file when there is only one file slice fetched

2023-05-06 Thread via GitHub
danny0405 merged PR #8529: URL: https://github.com/apache/hudi/pull/8529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] yihua commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186784885 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -411,6 +411,11 @@ object DataSourceWriteOptions { .markAdvanced

[hudi] branch master updated (21fedff40bf -> ec9cb3c4464)

2023-05-06 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 21fedff40bf [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer (#8646) add ec9cb3c4464 [HUDI-6095] Refactor

[GitHub] [hudi] danny0405 merged pull request #8491: [HUDI-6095] Refactor the judgment condition of WorkloadProfile

2023-05-06 Thread via GitHub
danny0405 merged PR #8491: URL: https://github.com/apache/hudi/pull/8491 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] danny0405 commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-06 Thread via GitHub
danny0405 commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1186784328 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType operat

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
nsivabalan commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186777965 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -411,6 +411,11 @@ object DataSourceWriteOptions { .markAdv

[hudi] branch master updated (e6ee5a83a13 -> 21fedff40bf)

2023-05-06 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e6ee5a83a13 [HUDI-6185] Too many logs in the ExternalSpillableMap (#8649) add 21fedff40bf [HUDI-6174] Fixing fl

[GitHub] [hudi] nsivabalan merged pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
nsivabalan merged PR #8646: URL: https://github.com/apache/hudi/pull/8646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] bvaradar commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
bvaradar commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537278449 @clownxc : For failed records, we need to have them logged elsewhere and so no need to deflate. For exception cases, the write status should be marked as failure. So, I don't see any reason

[GitHub] [hudi] hudi-bot commented on pull request #8596: [MINOR] Use try with resource to close stream

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1537278273 ## CI report: * e1816d660458d429f275ac0fd5d17f3eba5c1423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1537278113 ## CI report: * e49fef0e72149160ffe124636e7c89d1ebe97e18 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] hudi-bot commented on pull request #8596: [MINOR] Use try with resource to close stream

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1537277241 ## CI report: * e1816d660458d429f275ac0fd5d17f3eba5c1423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] c-f-cooper commented on pull request #8596: [MINOR] Use try with resource to close stream

2023-05-06 Thread via GitHub
c-f-cooper commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1537275780 > The CI is failing, can you check it. It's hard to find the problem,can you help me? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
clownxc commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537272950 According to the suggestion provided by @prashantwason , I did a test as follows: ```java WriteStatus status = new WriteStatus(true, 1.0); String partitionPath = HoodieTestD

[GitHub] [hudi] yihua commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186771703 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -411,6 +411,11 @@ object DataSourceWriteOptions { .markAdvanced

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1537270553 ## CI report: * fd6092c26ef67021d43a4b7b663d744933a20e06 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1537269501 ## CI report: * fd6092c26ef67021d43a4b7b663d744933a20e06 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8557: URL: https://github.com/apache/hudi/pull/8557#issuecomment-1537268636 ## CI report: * 2679efffa32d3f41bc7d80b9377b8267e5bdc3db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] clownxc opened a new pull request, #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
clownxc opened a new pull request, #8472: URL: https://github.com/apache/hudi/pull/8472 ### Change Logs WriteStatus stores the entire HoodieRecord. we can optimize it to store just the required info (record key, partition path, location). ### Impact Optimize `WriteStatus` t

[GitHub] [hudi] clownxc closed pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
clownxc closed pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord URL: https://github.com/apache/hudi/pull/8472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1537260822 ## CI report: * 227ea1a05110961c8348504be60bad103992bf96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1537259532 ## CI report: * 227ea1a05110961c8348504be60bad103992bf96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537252078 ## CI report: * b5d1633f7f621d17f14bab4044546568d2b90cd8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1690

[GitHub] [hudi] hudi-bot commented on pull request #8656: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8656: URL: https://github.com/apache/hudi/pull/8656#issuecomment-1537248148 ## CI report: * d9c88d68959a5e440acd77dac09074b1452dbe44 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8557: URL: https://github.com/apache/hudi/pull/8557#issuecomment-1537239942 ## CI report: * 75d4c5b67703349573569677d88deb2b1eb647fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1671

[GitHub] [hudi] hudi-bot commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8557: URL: https://github.com/apache/hudi/pull/8557#issuecomment-1537238922 ## CI report: * 75d4c5b67703349573569677d88deb2b1eb647fd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1671

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
nsivabalan commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186754142 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -411,6 +411,11 @@ object DataSourceWriteOptions { .markAdv

[GitHub] [hudi] hudi-bot commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8657: URL: https://github.com/apache/hudi/pull/8657#issuecomment-1537233178 ## CI report: * 1b0c06abe95feedd2f03f3507edce1cc4d7c3008 UNKNOWN * d486fba35f93c250625eeaaefbbfe4c076f5cb0d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8655: [MINOR] Add script to build bundle validation image with Spark 3.3.2

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8655: URL: https://github.com/apache/hudi/pull/8655#issuecomment-1537233170 ## CI report: * 984e496205596a0baa1210c2840ce9b4064acc98 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=168

[GitHub] [hudi] yihua commented on pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on PR #8557: URL: https://github.com/apache/hudi/pull/8557#issuecomment-1537232463 > @yihua : there are some CI failures and GH action failures. can you take a look The test failures are fixed now. -- This is an automated message from the Apache Git Service. To resp

[GitHub] [hudi] yihua commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186752187 ## hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/BootstrapExecutorUtils.java: ## @@ -263,10 +262,9 @@ private void initializeTable() throws IOException

[GitHub] [hudi] hudi-bot commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8657: URL: https://github.com/apache/hudi/pull/8657#issuecomment-1537232081 ## CI report: * 1b0c06abe95feedd2f03f3507edce1cc4d7c3008 UNKNOWN * e1849e32319fbdad43153c75d396447deedf381d Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537231983 ## CI report: * b647ef79567b2c25f567dec407f30139065d2fe3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1677

[GitHub] [hudi] yihua commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186752072 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala: ## @@ -591,8 +591,7 @@ class TestHoodieSparkSqlWriter { HoodieBoo

[GitHub] [hudi] yihua commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186752046 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala: ## @@ -591,8 +591,7 @@ class TestHoodieSparkSqlWriter { HoodieBoo

[GitHub] [hudi] yihua commented on a diff in pull request #8557: [HUDI-5895] Remove bootstrap key generator configs

2023-05-06 Thread via GitHub
yihua commented on code in PR #8557: URL: https://github.com/apache/hudi/pull/8557#discussion_r1186751959 ## hudi-examples/hudi-examples-spark/src/main/java/org/apache/hudi/examples/spark/HoodieSparkBootstrapExample.java: ## @@ -64,7 +64,6 @@ public static void main(String[] arg

[GitHub] [hudi] hudi-bot commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8657: URL: https://github.com/apache/hudi/pull/8657#issuecomment-1537231094 ## CI report: * 1b0c06abe95feedd2f03f3507edce1cc4d7c3008 UNKNOWN * e1849e32319fbdad43153c75d396447deedf381d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8388: [HUDI-5816] List all partitions as the fallback mechanism in Hive and Glue Sync

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8388: URL: https://github.com/apache/hudi/pull/8388#issuecomment-1537230992 ## CI report: * b647ef79567b2c25f567dec407f30139065d2fe3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1677

[GitHub] [hudi] hudi-bot commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8657: URL: https://github.com/apache/hudi/pull/8657#issuecomment-1537225084 ## CI report: * 1b0c06abe95feedd2f03f3507edce1cc4d7c3008 UNKNOWN * e1849e32319fbdad43153c75d396447deedf381d UNKNOWN Bot commands @hudi-bot supports the following

[GitHub] [hudi] hudi-bot commented on pull request #8656: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8656: URL: https://github.com/apache/hudi/pull/8656#issuecomment-1537225072 ## CI report: * d9c88d68959a5e440acd77dac09074b1452dbe44 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8655: [MINOR] Add script to build bundle validation image with Spark 3.3.2

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8655: URL: https://github.com/apache/hudi/pull/8655#issuecomment-1537225058 ## CI report: * 984e496205596a0baa1210c2840ce9b4064acc98 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8657: [HUDI-6150] Support bucketing for each hive client

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8657: URL: https://github.com/apache/hudi/pull/8657#issuecomment-1537224014 ## CI report: * 1b0c06abe95feedd2f03f3507edce1cc4d7c3008 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8655: [MINOR] Add script to build bundle validation image with Spark 3.3.2

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8655: URL: https://github.com/apache/hudi/pull/8655#issuecomment-1537223997 ## CI report: * 984e496205596a0baa1210c2840ce9b4064acc98 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8656: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8656: URL: https://github.com/apache/hudi/pull/8656#issuecomment-1537224005 ## CI report: * d9c88d68959a5e440acd77dac09074b1452dbe44 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-6150) Make hive sync to provide bucketing metadata when index=bucket

2023-05-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6150: - Labels: pull-request-available (was: ) > Make hive sync to provide bucketing metadata when index=

[GitHub] [hudi] parisni opened a new pull request, #8657: [HUDI-6150] Support bucketing for each hive client

2023-05-06 Thread via GitHub
parisni opened a new pull request, #8657: URL: https://github.com/apache/hudi/pull/8657 ### Change Logs This : - introduce a new hive bucketing spec to be propagated to each client - implement hms and glue - change implementation of hiveql - TODO? support sorting ##

[GitHub] [hudi] nsivabalan opened a new pull request, #8656: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
nsivabalan opened a new pull request, #8656: URL: https://github.com/apache/hudi/pull/8656 ### Change Logs Fixing flaky cleaner and replace commit tests in TestHoodieDeltastreamer ### Impact Fixing flaky cleaner and replace commit tests in TestHoodieDeltastreamer #

[jira] [Assigned] (HUDI-6150) Make hive sync to provide bucketing metadata when index=bucket

2023-05-06 Thread nicolas paris (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nicolas paris reassigned HUDI-6150: --- Assignee: nicolas paris > Make hive sync to provide bucketing metadata when index=bucket > --

[GitHub] [hudi] yihua opened a new pull request, #8655: [MINOR] Add script to build bundle validation image with Spark 3.3.2

2023-05-06 Thread via GitHub
yihua opened a new pull request, #8655: URL: https://github.com/apache/hudi/pull/8655 ### Change Logs As above. ### Impact Able to test Hudi bundles on Spark 3.3.2. ### Risk level none ### Documentation Update N/A ### Contributor's check

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1537213764 ## CI report: * 227ea1a05110961c8348504be60bad103992bf96 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537213624 ## CI report: * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN * e79140252ba476a4fef89ba85caabc4cd98ce85b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] cxzl25 commented on a diff in pull request #5168: [HUDI-3729][SPARK] fixed the per regression by enable vectorizeReader for parquet file

2023-05-06 Thread via GitHub
cxzl25 commented on code in PR #5168: URL: https://github.com/apache/hudi/pull/5168#discussion_r1186728549 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/MergeOnReadSnapshotRelation.scala: ## @@ -56,6 +56,11 @@ class MergeOnReadSnapshotRelation(sqlCont

[GitHub] [hudi] CTTY commented on a diff in pull request #8190: [HUDI-5936] Fix serialization problem when FileStatus is not serializable

2023-05-06 Thread via GitHub
CTTY commented on code in PR #8190: URL: https://github.com/apache/hudi/pull/8190#discussion_r1186726288 ## hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieSerializableFileStatus.java: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [hudi] hudi-bot commented on pull request #8529: [HUDI-6120] Filter base file when there is only one file slice fetched

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8529: URL: https://github.com/apache/hudi/pull/8529#issuecomment-1537185372 ## CI report: * 12dedd3628572545800a56dac4b97418610165be Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1537182352 ## CI report: * fd6092c26ef67021d43a4b7b663d744933a20e06 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] rohan-uptycs commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-06 Thread via GitHub
rohan-uptycs commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1186689133 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType ope

[GitHub] [hudi] rohan-uptycs commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-06 Thread via GitHub
rohan-uptycs commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1186717343 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType ope

[GitHub] [hudi] rohan-uptycs commented on a diff in pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-06 Thread via GitHub
rohan-uptycs commented on code in PR #8503: URL: https://github.com/apache/hudi/pull/8503#discussion_r1186717343 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndex.java: ## @@ -154,6 +154,14 @@ public boolean requiresTagging(WriteOperationType ope

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1537174790 ## CI report: * 891bb4f7d03751370b34645c1cc2efca7f02da80 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1688

[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537174682 ## CI report: * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1537173436 ## CI report: * 891bb4f7d03751370b34645c1cc2efca7f02da80 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1688

[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1537173338 ## CI report: * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1537171893 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * 9fb1a26adfb2dd569060d23ffa10617eb6d553be UNKNOWN * bacce6be4f42aea949eb5bbb04cb3f7179822524 Azure: [SUCCES

[GitHub] [hudi] hudi-bot commented on pull request #8650: Bump xalan from 2.7.2 to 2.7.3

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8650: URL: https://github.com/apache/hudi/pull/8650#issuecomment-1537160549 ## CI report: * 6878ff14acfafa573641e8d09f3f43f8f9618f36 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1689

[GitHub] [hudi] hudi-bot commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1537160316 ## CI report: * f43a772d2efe7d19657b44d2ce8b92b8fcee390f UNKNOWN * b6a62994325c4b1110a265262a56da1a94e1f6e2 UNKNOWN * 9b8fd35638220cdc084d9d16b7e39b67025798aa Azure: [FAILUR

[GitHub] [hudi] codope commented on pull request #8397: [HUDI-6055] Fix input format for bootstrap tables

2023-05-06 Thread via GitHub
codope commented on PR #8397: URL: https://github.com/apache/hudi/pull/8397#issuecomment-1537149743 @yihua @xiarixiaoyao thanks for reviewing. As mentioned in the comments, the unnecessary instantiation is already removed and we don't need lazy listing in Hive file index impl for now. So, I

[GitHub] [hudi] codope closed pull request #8397: [HUDI-6055] Fix input format for bootstrap tables

2023-05-06 Thread via GitHub
codope closed pull request #8397: [HUDI-6055] Fix input format for bootstrap tables URL: https://github.com/apache/hudi/pull/8397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] codope commented on a diff in pull request #8397: [HUDI-6055] Fix input format for bootstrap tables

2023-05-06 Thread via GitHub
codope commented on code in PR #8397: URL: https://github.com/apache/hudi/pull/8397#discussion_r1186702216 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/SchemaEvolutionContext.java: ## @@ -82,32 +81,25 @@ public class SchemaEvolutionContext { private final InputSpl

[GitHub] [hudi] codope commented on a diff in pull request #8397: [HUDI-6055] Fix input format for bootstrap tables

2023-05-06 Thread via GitHub
codope commented on code in PR #8397: URL: https://github.com/apache/hudi/pull/8397#discussion_r1186702204 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HiveHoodieTableFileIndex.java: ## @@ -58,7 +58,7 @@ public HiveHoodieTableFileIndex(HoodieEngineContext engineContex

[GitHub] [hudi] c-f-cooper commented on issue #8652: [SUPPORT]Parquet is not a valid Parquet File

2023-05-06 Thread via GitHub
c-f-cooper commented on issue #8652: URL: https://github.com/apache/hudi/issues/8652#issuecomment-1537145600 > What is the status of instant `20230503171323465` on the timeline? Does it succeed or failed, is there possibility we trigger manual rollback for it? the status are request,i

[GitHub] [hudi] c-f-cooper commented on issue #8651: [SUPPORT]How to resolve small file?

2023-05-06 Thread via GitHub
c-f-cooper commented on issue #8651: URL: https://github.com/apache/hudi/issues/8651#issuecomment-1537144673 > @c-f-cooper Are you using Insert or Bulk Insert. Can you please share the clustering command and table configs you are using. we use cow+insert mode,beside the clustering con

[GitHub] [hudi] c-f-cooper commented on issue #8651: [SUPPORT]How to resolve small file?

2023-05-06 Thread via GitHub
c-f-cooper commented on issue #8651: URL: https://github.com/apache/hudi/issues/8651#issuecomment-1537144331 > What kind of clustering are you using, online or offline? For no effect do you mean no bigger Parquets are generated and replacing the existing files? we use online async clu

[jira] [Commented] (HUDI-6144) [Spark][Flink]bucket index and then insert data in bulk, the correct file cannot be created

2023-05-06 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17720215#comment-17720215 ] xi chaomin commented on HUDI-6144: -- Currently bucket index doesn't support bulk insert. T

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-05-06 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1537140908 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN * e6

  1   2   >