[GitHub] [hudi] scxwhite commented on a diff in pull request #5030: [HUDI-3617] MOR compact improve

2022-09-08 Thread GitBox
scxwhite commented on code in PR #5030: URL: https://github.com/apache/hudi/pull/5030#discussion_r965590717 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -123,25 +133,24 @@ public long getNumMergedRecordsInLog() { ret

[jira] [Updated] (HUDI-4820) ORC dependency conflicts with spark 3.1

2022-09-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4820: - Labels: pull-request-available (was: ) > ORC dependency conflicts with spark 3.1 > --

[GitHub] [hudi] xicm opened a new pull request, #6642: [HUDI-4820] ORC dependency conflicts with spark 3.1

2022-09-08 Thread GitBox
xicm opened a new pull request, #6642: URL: https://github.com/apache/hudi/pull/6642 ### Change Logs ORC version in hudi is 1.6.0 , while the version in spark 3.1 is 1.5.12. I try to set orc.version to 1.5 in spark3.1 profile, there are other conflicts. So I copy and rename the or

[GitHub] [hudi] danny0405 commented on issue #6621: [SUPPORT]com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 36

2022-09-08 Thread GitBox
danny0405 commented on issue #6621: URL: https://github.com/apache/hudi/issues/6621#issuecomment-1241550571 @alexeykudinkin Do you have interest on this ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[jira] [Updated] (HUDI-4820) ORC dependency conflicts with spark 3.1

2022-09-08 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-4820: - Summary: ORC dependency conflicts with spark 3.1 (was: ORC dependency conflict with spark 3.1) > ORC dep

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241545025 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] hudi-bot commented on pull request #6631: [HUDI-4810] Fixing Hudi bundles requiring log4j2 on the classpath

2022-09-08 Thread GitBox
hudi-bot commented on PR #6631: URL: https://github.com/apache/hudi/pull/6631#issuecomment-1241539499 ## CI report: * cf25a4e0d37980de4284afe841eced2f205b97a5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1126

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241539131 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] paul8263 commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
paul8263 commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241538407 Hi @yihua and @codope , How could I rerun the CI? It seems that commenting the bot command takes no effect. The commit [3ae4fb8](https://github.com/apache/hudi/commit/3ae4fb8b

[GitHub] [hudi] danny0405 commented on issue #6621: [SUPPORT]com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 36

2022-09-08 Thread GitBox
danny0405 commented on issue #6621: URL: https://github.com/apache/hudi/issues/6621#issuecomment-1241538207 https://github.com/JohnSnowLabs/spark-nlp/issues/1123 Did you try configuring the serializer explicitly: `spark.serializer: org.apache.spark.serializer.KryoSerializer` -- Thi

[jira] [Updated] (HUDI-4820) ORC dependency conflict with spark 3.1

2022-09-08 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xi chaomin updated HUDI-4820: - Description: Set _*hoodie.table.base.file.format*_ to *ORC,* I get an Exception   {code:java} java.lang.N

[jira] [Created] (HUDI-4820) ORC dependency conflict with spark 3.1

2022-09-08 Thread xi chaomin (Jira)
xi chaomin created HUDI-4820: Summary: ORC dependency conflict with spark 3.1 Key: HUDI-4820 URL: https://issues.apache.org/jira/browse/HUDI-4820 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] paul8263 commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
paul8263 commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241525099 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241501161 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] hudi-bot commented on pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-08 Thread GitBox
hudi-bot commented on PR #6550: URL: https://github.com/apache/hudi/pull/6550#issuecomment-1241498549 ## CI report: * 684ca9bdec8d75a27bf78ec09bf2ba31f67bdda4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1113

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241498437 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[jira] [Created] (HUDI-4819) run_sync_tool.sh in hudi-hive-sync fails with classpath errors on release-0.12.0

2022-09-08 Thread Pramod Biligiri (Jira)
Pramod Biligiri created HUDI-4819: - Summary: run_sync_tool.sh in hudi-hive-sync fails with classpath errors on release-0.12.0 Key: HUDI-4819 URL: https://issues.apache.org/jira/browse/HUDI-4819 Projec

[GitHub] [hudi] paul8263 commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
paul8263 commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241480297 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot commented on pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-08 Thread GitBox
hudi-bot commented on PR #6550: URL: https://github.com/apache/hudi/pull/6550#issuecomment-1241471920 ## CI report: * 684ca9bdec8d75a27bf78ec09bf2ba31f67bdda4 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1113

[GitHub] [hudi] danny0405 commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-09-08 Thread GitBox
danny0405 commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r966599692 ## rfc/rfc-51/rfc-51.md: ## @@ -215,18 +245,31 @@ Note: - Only instants that are active can be queried in a CDC scenario. - `CDCReader` manages all the things on CDC,

[GitHub] [hudi] hudi-bot commented on pull request #6631: [HUDI-4810] Fixing Hudi bundles requiring log4j2 on the classpath

2022-09-08 Thread GitBox
hudi-bot commented on PR #6631: URL: https://github.com/apache/hudi/pull/6631#issuecomment-1241469308 ## CI report: * e8e8c4d8047b5985764f7534bd84e82763c3ad28 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124

[GitHub] [hudi] danny0405 commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-09-08 Thread GitBox
danny0405 commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r966598104 ## rfc/rfc-51/rfc-51.md: ## @@ -148,20 +152,46 @@ hudi_cdc_table/ Under a partition directory, the `.log` file with `CDCBlock` above will keep the changing data we ha

[GitHub] [hudi] hudi-bot commented on pull request #6631: [HUDI-4810] Fixing Hudi bundles requiring log4j2 on the classpath

2022-09-08 Thread GitBox
hudi-bot commented on PR #6631: URL: https://github.com/apache/hudi/pull/6631#issuecomment-1241466870 ## CI report: * e8e8c4d8047b5985764f7534bd84e82763c3ad28 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1124

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241466655 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241464172 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] TJX2014 commented on pull request #6595: [HUDI-4777] Fix flink gen bucket index of mor table not consistent wi…

2022-09-08 Thread GitBox
TJX2014 commented on PR #6595: URL: https://github.com/apache/hudi/pull/6595#issuecomment-1241461060 > > but in flink side, I think deduplicate should also open as default option for mor table , when duplicate write to log file, very hard for compact to read, also lead mor table not stable

[GitHub] [hudi] TJX2014 commented on pull request #6595: [HUDI-4777] Fix flink gen bucket index of mor table not consistent wi…

2022-09-08 Thread GitBox
TJX2014 commented on PR #6595: URL: https://github.com/apache/hudi/pull/6595#issuecomment-1241460834 > Not exactly, if we deduplicate the record in memory, and then write to log is elegant for MOR because result is same. As @danny0405 say, in cdc situation, we need to retain origin records,

[GitHub] [hudi] TJX2014 commented on pull request #6595: [HUDI-4777] Fix flink gen bucket index of mor table not consistent wi…

2022-09-08 Thread GitBox
TJX2014 commented on PR #6595: URL: https://github.com/apache/hudi/pull/6595#issuecomment-1241460494 > Not exactly, if we deduplicate the record in memory, and then write to log is elegant for MOR because result is same. As @danny0405 say, in cdc situation, we need to retain origin

[GitHub] [hudi] danny0405 commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

2022-09-08 Thread GitBox
danny0405 commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r966591772 ## rfc/rfc-51/rfc-51.md: ## @@ -64,69 +65,72 @@ We follow the debezium output format: four columns as shown below Note: the illustration here ignores all the Hudi met

[GitHub] [hudi] paul8263 commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
paul8263 commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241457900 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] TJX2014 commented on a diff in pull request #6630: [HUDI-4808] Fix HoodieSimpleBucketIndex not consider bucket num in lo…

2022-09-08 Thread GitBox
TJX2014 commented on code in PR #6630: URL: https://github.com/apache/hudi/pull/6630#discussion_r966590804 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java: ## @@ -72,6 +73,26 @@ public static List getLatestBaseFilesForPartition(

[GitHub] [hudi] TJX2014 commented on a diff in pull request #6634: [HUDI-4813] Fix infer keygen not work in sparksql side issue

2022-09-08 Thread GitBox
TJX2014 commented on code in PR #6634: URL: https://github.com/apache/hudi/pull/6634#discussion_r966590414 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -787,9 +787,13 @@ object DataSourceOptionsHelper { def inferKe

[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #6631: [HUDI-4810] Fixing Hudi bundles requiring log4j2 on the classpath

2022-09-08 Thread GitBox
the-other-tim-brown commented on code in PR #6631: URL: https://github.com/apache/hudi/pull/6631#discussion_r966589943 ## hudi-client/hudi-flink-client/pom.xml: ## @@ -35,7 +35,21 @@ - + + Review Comment: nitpick: indentation is off here

[jira] [Updated] (HUDI-4810) Fix Hudi bundles requiring log4j2 on the classpath

2022-09-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4810: -- Status: Patch Available (was: In Progress) > Fix Hudi bundles requiring log4j2 on the classpath

[jira] [Updated] (HUDI-3391) presto and hive beeline fails to read MOR table w/ 2 or more array fields

2022-09-08 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3391: -- Status: Patch Available (was: In Progress) > presto and hive beeline fails to read MOR table w/ 2 or mo

[jira] [Closed] (HUDI-4465) Optimizing file-listing path in MT

2022-09-08 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-4465. - Resolution: Done > Optimizing file-listing path in MT > -- > >

[hudi] branch master updated: [HUDI-4465] Optimizing file-listing sequence of Metadata Table (#6016)

2022-09-08 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4af60dcfba [HUDI-4465] Optimizing file-listing seq

[GitHub] [hudi] codope merged pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-08 Thread GitBox
codope merged PR #6016: URL: https://github.com/apache/hudi/pull/6016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.or

[GitHub] [hudi] codope commented on a diff in pull request #6016: [HUDI-4465] Optimizing file-listing sequence of Metadata Table

2022-09-08 Thread GitBox
codope commented on code in PR #6016: URL: https://github.com/apache/hudi/pull/6016#discussion_r966582484 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/SimpleKeyGenerator.java: ## @@ -46,6 +47,12 @@ public SimpleKeyGenerator(TypedProperties props) {

[GitHub] [hudi] LinMingQiang commented on issue #6618: Caused by: org.apache.http.NoHttpResponseException: xxxxxx:34812 failed to respond[SUPPORT]

2022-09-08 Thread GitBox
LinMingQiang commented on issue #6618: URL: https://github.com/apache/hudi/issues/6618#issuecomment-1241441100 I have encountered this problem,this pr may solve your problem : https://github.com/apache/hudi/pull/6393 -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] Zhifeiyu commented on issue #6640: [SUPPORT] HUDI partition table duplicate data cow hudi 0.10.0 flink 1.13.1

2022-09-08 Thread GitBox
Zhifeiyu commented on issue #6640: URL: https://github.com/apache/hudi/issues/6640#issuecomment-1241440579 mark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[jira] [Updated] (HUDI-4818) Using CustomKeyGenerator fails w/ SparkHoodieTableFileIndex

2022-09-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-4818: - Labels: pull-request-available (was: ) > Using CustomKeyGenerator fails w/ SparkHoodieTableFileIn

[GitHub] [hudi] alexeykudinkin opened a new pull request, #6641: [WIP][HUDI-4818] Fixing SparkHoodieTableFileIndex handling of KeyGenerators changing the type of returned partition-value

2022-09-08 Thread GitBox
alexeykudinkin opened a new pull request, #6641: URL: https://github.com/apache/hudi/pull/6641 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any perfo

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241435958 ## CI report: * 252b9f49e2c86c7aad7908e5887064b0d0b36932 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[jira] [Created] (HUDI-4818) Using CustomKeyGenerator fails w/ SparkHoodieTableFileIndex

2022-09-08 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-4818: - Summary: Using CustomKeyGenerator fails w/ SparkHoodieTableFileIndex Key: HUDI-4818 URL: https://issues.apache.org/jira/browse/HUDI-4818 Project: Apache Hudi

[jira] [Updated] (HUDI-4818) Using CustomKeyGenerator fails w/ SparkHoodieTableFileIndex

2022-09-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4818: -- Story Points: 4 (was: 2) > Using CustomKeyGenerator fails w/ SparkHoodieTableFileIndex > --

[GitHub] [hudi] zwj0110 opened a new issue, #6640: [SUPPORT] HUDI partition table duplicate data cow

2022-09-08 Thread GitBox
zwj0110 opened a new issue, #6640: URL: https://github.com/apache/hudi/issues/6640 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr..

[GitHub] [hudi] wqwl611 commented on pull request #6636: add new index RANGE_BUCKET , when primary key is auto-increment like most mysql table

2022-09-08 Thread GitBox
wqwl611 commented on PR #6636: URL: https://github.com/apache/hudi/pull/6636#issuecomment-1241432142 > Nice feature, can we log an JIRA issue and change the commit title to "[HUDI-${JIRA_ID}] ${your title}" @danny0405 yes,Thanks。 -- This is an automated message from the Apache Git

[GitHub] [hudi] wqwl611 commented on pull request #6636: add new index RANGE_BUCKET , when primary key is auto-increment like most mysql table

2022-09-08 Thread GitBox
wqwl611 commented on PR #6636: URL: https://github.com/apache/hudi/pull/6636#issuecomment-1241430662 > Nice feature, can we log an JIRA issue and change the commit title to "[HUDI-${JIRA_ID}] ${your title}" yes -- This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] YuweiXiao commented on pull request #6632: [HUDI-4753] more accurate record size estimation for log writing and spillable map

2022-09-08 Thread GitBox
YuweiXiao commented on PR #6632: URL: https://github.com/apache/hudi/pull/6632#issuecomment-1241421818 @yihua Hey Yihua, could you help review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] YuweiXiao commented on pull request #6629: [HUDI-4807] Use base table instant for metadata table initialization

2022-09-08 Thread GitBox
YuweiXiao commented on PR #6629: URL: https://github.com/apache/hudi/pull/6629#issuecomment-1241421204 > @YuweiXiao Can you please add more details in the PR description? It would be great if you add a test as well. Have added some details. Feel like it is not easy to add a test for i

[GitHub] [hudi] danny0405 commented on pull request #6595: [HUDI-4777] Fix flink gen bucket index of mor table not consistent wi…

2022-09-08 Thread GitBox
danny0405 commented on PR #6595: URL: https://github.com/apache/hudi/pull/6595#issuecomment-1241417330 > > > > I will fix give pr fix in spark side too, but in flink side, I think deduplicate should also open as default option for mor table , when duplicate write to log file, very h

[GitHub] [hudi] danny0405 commented on a diff in pull request #6634: [HUDI-4813] Fix infer keygen not work in sparksql side issue

2022-09-08 Thread GitBox
danny0405 commented on code in PR #6634: URL: https://github.com/apache/hudi/pull/6634#discussion_r966556852 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -787,9 +787,13 @@ object DataSourceOptionsHelper { def infer

[jira] [Commented] (HUDI-4811) Fix the checkstyle of hudi flink

2022-09-08 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17602058#comment-17602058 ] Danny Chen commented on HUDI-4811: -- Fixed via master branch: 13eb892081fc4ddd5e1592ef8698

[jira] [Resolved] (HUDI-4811) Fix the checkstyle of hudi flink

2022-09-08 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-4811. -- > Fix the checkstyle of hudi flink > > > Key: HUDI-4811 >

[jira] [Updated] (HUDI-4811) Fix the checkstyle of hudi flink

2022-09-08 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-4811: - Fix Version/s: 0.12.1 > Fix the checkstyle of hudi flink > > >

[hudi] branch master updated (e1da06fa70 -> 13eb892081)

2022-09-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e1da06fa70 [MINOR] Typo fix for kryo in flink-bundle (#6639) add 13eb892081 [HUDI-4811] Fix the checkstyle of hu

[GitHub] [hudi] danny0405 merged pull request #6633: [HUDI-4811] Fix the checkstyle of hudi flink

2022-09-08 Thread GitBox
danny0405 merged PR #6633: URL: https://github.com/apache/hudi/pull/6633 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241399541 ## CI report: * 3ae4fb8b374e12b1097a86d56e5996b7dc0ac79f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1121

[GitHub] [hudi] danny0405 commented on pull request #6636: add new index RANGE_BUCKET , when primary key is auto-increment like most mysql table

2022-09-08 Thread GitBox
danny0405 commented on PR #6636: URL: https://github.com/apache/hudi/pull/6636#issuecomment-1241399179 Nice feature, can we log an JIRA issue and change the commit title to "[HUDI-${JIRA_ID}] ${your title}" -- This is an automated message from the Apache Git Service. To respond to the mes

[hudi] branch master updated: [MINOR] Typo fix for kryo in flink-bundle (#6639)

2022-09-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e1da06fa70 [MINOR] Typo fix for kryo in flink-b

[GitHub] [hudi] danny0405 merged pull request #6639: [MINOR] Typo fix for kryo in flink-bundle

2022-09-08 Thread GitBox
danny0405 merged PR #6639: URL: https://github.com/apache/hudi/pull/6639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] hudi-bot commented on pull request #6489: [HUDI-4485] [cli] Bumped spring shell to 2.1.1. Updated the default …

2022-09-08 Thread GitBox
hudi-bot commented on PR #6489: URL: https://github.com/apache/hudi/pull/6489#issuecomment-1241397077 ## CI report: * 3ae4fb8b374e12b1097a86d56e5996b7dc0ac79f Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1121

[GitHub] [hudi] dongkelun commented on a diff in pull request #5478: [HUDI-3998] Fix getCommitsSinceLastCleaning failed when async cleaning

2022-09-08 Thread GitBox
dongkelun commented on code in PR #5478: URL: https://github.com/apache/hudi/pull/5478#discussion_r966535607 ## hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/RequestHandler.java: ## @@ -539,4 +543,19 @@ public void handle(@NotNull Context context) throws

[GitHub] [hudi] hudi-bot commented on pull request #6639: [MINOR] Typo fix for kryo in flink-bundle

2022-09-08 Thread GitBox
hudi-bot commented on PR #6639: URL: https://github.com/apache/hudi/pull/6639#issuecomment-1241323712 ## CI report: * e83471bf24d848fbb3c8ec16decf1bcbe0d5449a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[jira] [Updated] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4817: Sprint: 2022/09/05 > Markers are not deleted after bootstrap operation > ---

[jira] [Updated] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4817: Fix Version/s: 0.12.1 > Markers are not deleted after bootstrap operation >

[jira] [Updated] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4817: Priority: Critical (was: Major) > Markers are not deleted after bootstrap operation > -

[jira] [Updated] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4817: Affects Version/s: 0.12.0 > Markers are not deleted after bootstrap operation >

[jira] [Assigned] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-4817: --- Assignee: Ethan Guo > Markers are not deleted after bootstrap operation > ---

[jira] [Created] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4817: --- Summary: Markers are not deleted after bootstrap operation Key: HUDI-4817 URL: https://issues.apache.org/jira/browse/HUDI-4817 Project: Apache Hudi Issue Type: Bug

[jira] [Updated] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4817: Labels: bootstrap (was: ) > Markers are not deleted after bootstrap operation > ---

[jira] [Updated] (HUDI-4817) Markers are not deleted after bootstrap operation

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4817: Component/s: bootstrap > Markers are not deleted after bootstrap operation > ---

[GitHub] [hudi] hudi-bot commented on pull request #6637: Fix AWSDmsAvroPayload#getInsertValue,combineAndGetUpdateValue to invo…

2022-09-08 Thread GitBox
hudi-bot commented on PR #6637: URL: https://github.com/apache/hudi/pull/6637#issuecomment-1241264591 ## CI report: * 5b7d712d175b64de73ce924bbdb95962ebb790fe Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[jira] [Updated] (HUDI-4816) Update asf-site docs for GlobalDeleteKeyGenerator

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4816: Description: The GlobalDeleteKeyGenerator should be used with a global index to delete records based on the

[jira] [Updated] (HUDI-4816) Update asf-site docs for GlobalDeleteKeyGenerator

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4816: Component/s: docs > Update asf-site docs for GlobalDeleteKeyGenerator >

[jira] [Assigned] (HUDI-4816) Update asf-site docs for GlobalDeleteKeyGenerator

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-4816: --- Assignee: Bhavani Sudha > Update asf-site docs for GlobalDeleteKeyGenerator > ---

[jira] [Created] (HUDI-4816) Update asf-site docs for GlobalDeleteKeyGenerator

2022-09-08 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-4816: --- Summary: Update asf-site docs for GlobalDeleteKeyGenerator Key: HUDI-4816 URL: https://issues.apache.org/jira/browse/HUDI-4816 Project: Apache Hudi Issue Type: Improve

[jira] [Updated] (HUDI-4816) Update asf-site docs for GlobalDeleteKeyGenerator

2022-09-08 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4816: Fix Version/s: 0.12.1 > Update asf-site docs for GlobalDeleteKeyGenerator >

[GitHub] [hudi] umehrot2 commented on pull request #6637: Fix AWSDmsAvroPayload#getInsertValue,combineAndGetUpdateValue to invo…

2022-09-08 Thread GitBox
umehrot2 commented on PR #6637: URL: https://github.com/apache/hudi/pull/6637#issuecomment-1241218981 Fix LGTM. However, we should not be adding this whole end to end test in `TestCOWDataSource` and `TestMORDataSource`. These tests are there to test overall datasource related functionali

[GitHub] [hudi] hudi-bot commented on pull request #5478: [HUDI-3998] Fix getCommitsSinceLastCleaning failed when async cleaning

2022-09-08 Thread GitBox
hudi-bot commented on PR #5478: URL: https://github.com/apache/hudi/pull/5478#issuecomment-1241218459 ## CI report: * 57cd61a9a02c62becbf0b763d322d0f70e68b588 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] hudi-bot commented on pull request #6639: [MINOR] Typo fix for kryo in flink-bundle

2022-09-08 Thread GitBox
hudi-bot commented on PR #6639: URL: https://github.com/apache/hudi/pull/6639#issuecomment-1241210693 ## CI report: * e83471bf24d848fbb3c8ec16decf1bcbe0d5449a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] hudi-bot commented on pull request #6636: add new index RANGE_BUCKET , when primary key is auto-increment like most mysql table

2022-09-08 Thread GitBox
hudi-bot commented on PR #6636: URL: https://github.com/apache/hudi/pull/6636#issuecomment-1241210655 ## CI report: * b837b813fb706508b1fccc0924f839275e9373c3 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] hudi-bot commented on pull request #6639: [MINOR] Typo fix for kryo in flink-bundle

2022-09-08 Thread GitBox
hudi-bot commented on PR #6639: URL: https://github.com/apache/hudi/pull/6639#issuecomment-1241205871 ## CI report: * e83471bf24d848fbb3c8ec16decf1bcbe0d5449a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] xccui opened a new pull request, #6639: [MINOR] Typo fix for kryo in flink-bundle

2022-09-08 Thread GitBox
xccui opened a new pull request, #6639: URL: https://github.com/apache/hudi/pull/6639 ### Change Logs There's a typo for the `Kryo-shaded` item in flink-bundle's pom file which causes the lib not to be packaged into the bundle file. Not sure if we should include it or just remove it.

[GitHub] [hudi] bhasudha opened a new pull request, #6638: [DO NOT MERGE] [DOCS] Add tags to blog pages

2022-09-08 Thread GitBox
bhasudha opened a new pull request, #6638: URL: https://github.com/apache/hudi/pull/6638 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[GitHub] [hudi] xushiyan commented on a diff in pull request #5320: [HUDI-3861] update tblp 'path' when rename table

2022-09-08 Thread GitBox
xushiyan commented on code in PR #5320: URL: https://github.com/apache/hudi/pull/5320#discussion_r966357792 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestAlterTable.scala: ## @@ -194,9 +194,14 @@ class TestAlterTable extends HoodieSparkSqlTestB

[jira] [Updated] (HUDI-2733) Adding Thrift support in HiveSyncTool

2022-09-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2733: - Sprint: Cont' improve - 2021/01/24, Cont' improve - 2021/01/31, Cont' improve - 2022/02/07, Cont' impro

[jira] [Updated] (HUDI-2733) Adding Thrift support in HiveSyncTool

2022-09-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2733: - Sprint: Cont' improve - 2021/01/24, Cont' improve - 2021/01/31, Cont' improve - 2022/02/07, Cont' impro

[jira] [Updated] (HUDI-4585) Optimize query performance on Presto Hudi connector

2022-09-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4585: - Reviewers: Sagar Sumit (was: Raymond Xu, Sagar Sumit) > Optimize query performance on Presto Hudi connec

[GitHub] [hudi] xushiyan commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-08 Thread GitBox
xushiyan commented on code in PR #6550: URL: https://github.com/apache/hudi/pull/6550#discussion_r966341103 ## pom.xml: ## @@ -377,9 +377,17 @@ org.sl4fj:slf4j-jcl log4j:log4j ch.qos.logback:logback-classic +

[GitHub] [hudi] xushiyan commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-08 Thread GitBox
xushiyan commented on code in PR #6550: URL: https://github.com/apache/hudi/pull/6550#discussion_r966335560 ## hudi-utilities/pom.xml: ## @@ -139,14 +125,22 @@ + + + + org.apache.hudi - hudi-spark-common_${scala.binary.version} +

[GitHub] [hudi] hudi-bot commented on pull request #6196: [HUDI-4071] Enable schema reconciliation by default

2022-09-08 Thread GitBox
hudi-bot commented on PR #6196: URL: https://github.com/apache/hudi/pull/6196#issuecomment-1241126360 ## CI report: * 1dfb9ffa267bce2c73bdc10e285a3ab2d3e15939 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] xushiyan commented on a diff in pull request #6550: [HUDI-4691] Cleaning up duplicated classes in Spark 3.3 module

2022-09-08 Thread GitBox
xushiyan commented on code in PR #6550: URL: https://github.com/apache/hudi/pull/6550#discussion_r966311837 ## hudi-spark-datasource/hudi-spark3.2plus-common/pom.xml: ## @@ -0,0 +1,234 @@ + + +http://maven.apache.org/POM/4.0.0"; + xmlns:xsi="http://www.w3.org/2001/XMLSch

[GitHub] [hudi] xushiyan commented on pull request #6535: [HUDI-4193] change protoc version so it compiles on m1 mac

2022-09-08 Thread GitBox
xushiyan commented on PR #6535: URL: https://github.com/apache/hudi/pull/6535#issuecomment-1241092456 > I don't think anything needs to be added to the README. It has the activation tag and checks if the os is mac and the processor type is aarch64 and if those are true then it activates the

[GitHub] [hudi] hudi-bot commented on pull request #6637: Fix AWSDmsAvroPayload#getInsertValue,combineAndGetUpdateValue to invo…

2022-09-08 Thread GitBox
hudi-bot commented on PR #6637: URL: https://github.com/apache/hudi/pull/6637#issuecomment-1241064229 ## CI report: * 5b7d712d175b64de73ce924bbdb95962ebb790fe Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1125

[GitHub] [hudi] yihua closed issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

2022-09-08 Thread GitBox
yihua closed issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index URL: https://github.com/apache/hudi/issues/6623 -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [hudi] yihua commented on issue #6623: [SUPPORT] java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.client.ClusterStatusListener$MulticastListener with HBase Index

2022-09-08 Thread GitBox
yihua commented on issue #6623: URL: https://github.com/apache/hudi/issues/6623#issuecomment-1241060251 @praveenkmr Great to hear that. For the upgrade to OSS Hudi 0.12.0 (the latest release), using hudi-spark-bundle should be sufficient as OSS Hudi 0.12.0 bundle jars work out-of-the

[GitHub] [hudi] hudi-bot commented on pull request #6637: Fix AWSDmsAvroPayload#getInsertValue,combineAndGetUpdateValue to invo…

2022-09-08 Thread GitBox
hudi-bot commented on PR #6637: URL: https://github.com/apache/hudi/pull/6637#issuecomment-1241058661 ## CI report: * 5b7d712d175b64de73ce924bbdb95962ebb790fe UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] fengjian428 commented on a diff in pull request #4676: [HUDI-3304] support partial update on mor table

2022-09-08 Thread GitBox
fengjian428 commented on code in PR #4676: URL: https://github.com/apache/hudi/pull/4676#discussion_r966271696 ## hudi-common/src/main/java/org/apache/hudi/common/model/PartialUpdateAvroPayload.java: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

  1   2   3   >