Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016360644 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 Azure:

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2016357819 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 Azure:

Re: [PR] [HUDI-7499] Support FirstValueAvroPayload for Hudi [hudi]

2024-03-22 Thread via GitHub
xuzifu666 commented on code in PR #10857: URL: https://github.com/apache/hudi/pull/10857#discussion_r1535173701 ## hudi-common/src/main/java/org/apache/hudi/common/model/FirstValueAvroPayload.java: ## @@ -27,6 +27,52 @@ import java.util.Properties; +/** + * Payload clazz

Re: [PR] [HUDI-7499] Support FirstValueAvroPayload for Hudi [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10857: URL: https://github.com/apache/hudi/pull/10857#issuecomment-2016356059 ## CI report: * bc681dfb2f4c025ca79041e509288d7480ba0f74 UNKNOWN * cb3cb97badef6e0138400641124a8320b52a2235 UNKNOWN * 30c7ff3052d7f434a6abb174b8e7b0101d2385c5 UNKNOWN *

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536561023 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536560898 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner

Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2016340988 ## CI report: * 7f8cf6295186dc445620267c4729a4c3e5cb40b7 Azure:

Re: [PR] [HUDI-7499] Support FirstValueAvroPayload for Hudi [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10857: URL: https://github.com/apache/hudi/pull/10857#issuecomment-2016329181 ## CI report: * bc681dfb2f4c025ca79041e509288d7480ba0f74 UNKNOWN * cb3cb97badef6e0138400641124a8320b52a2235 UNKNOWN * 30c7ff3052d7f434a6abb174b8e7b0101d2385c5 UNKNOWN *

Re: [PR] [HUDI-7499] Support FirstValueAvroPayload for Hudi [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10857: URL: https://github.com/apache/hudi/pull/10857#issuecomment-2016327618 ## CI report: * bc681dfb2f4c025ca79041e509288d7480ba0f74 UNKNOWN * cb3cb97badef6e0138400641124a8320b52a2235 UNKNOWN * 30c7ff3052d7f434a6abb174b8e7b0101d2385c5 UNKNOWN *

Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2016309136 ## CI report: * d9ef1abc4e454bed98a56d6780f2dd04508ebf0b Azure:

Re: [PR] [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT [hudi]

2024-03-22 Thread via GitHub
danny0405 commented on code in PR #10874: URL: https://github.com/apache/hudi/pull/10874#discussion_r1536506365 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1410,35 +1388,19 @@ protected void

[jira] [Created] (HUDI-7533) Remove the restriction for MDT compaction scheduling under log compaction scope

2024-03-22 Thread Danny Chen (Jira)
Danny Chen created HUDI-7533: Summary: Remove the restriction for MDT compaction scheduling under log compaction scope Key: HUDI-7533 URL: https://issues.apache.org/jira/browse/HUDI-7533 Project: Apache

Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2016266052 ## CI report: * d9ef1abc4e454bed98a56d6780f2dd04508ebf0b Azure:

Re: [PR] [HUDI-7510] Loosen the compaction scheduling and rollback check for MDT [hudi]

2024-03-22 Thread via GitHub
danny0405 commented on code in PR #10874: URL: https://github.com/apache/hudi/pull/10874#discussion_r1536489142 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1130,22 +1128,6 @@ public void

Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2016259956 ## CI report: * d9ef1abc4e454bed98a56d6780f2dd04508ebf0b Azure:

Re: [I] [SUPPORT] Flink-Hudi - Upsert into the same Hudi table via two different Flink pipelines (stream and batch) [hudi]

2024-03-22 Thread via GitHub
danny0405 commented on issue #10914: URL: https://github.com/apache/hudi/issues/10914#issuecomment-2016256740 You can use the bulk_insert for history data and regular upsert for incrmental streaming ingestion. Note that when you choose `flink_state` index instead of `bucket` index, the

(hudi) branch master updated: [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables (#10908)

2024-03-22 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a8e9db446c3 [HUDI-7530] Refactoring of

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
danny0405 merged PR #10908: URL: https://github.com/apache/hudi/pull/10908 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Closed] (HUDI-7530) Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables

2024-03-22 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7530. Resolution: Fixed Fixed via master branch: a8e9db446c362a364a49749fab795be31fc33afb > Refactoring of

[jira] [Updated] (HUDI-7530) Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables

2024-03-22 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7530: - Fix Version/s: 1.0.0 > Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables >

(hudi) branch master updated: [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` (#10912)

2024-03-22 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 6be7205a1e3 [MINOR] Refactored `@Before*` and

Re: [PR] [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` [hudi]

2024-03-22 Thread via GitHub
danny0405 merged PR #10912: URL: https://github.com/apache/hudi/pull/10912 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7487] Fixed test with in-memory index by proper heap clearing [hudi]

2024-03-22 Thread via GitHub
danny0405 commented on PR #10910: URL: https://github.com/apache/hudi/pull/10910#issuecomment-2016238508 So any test cases with memory bucket index should add the clear logic, can we improve that and avoid future mistakes if possible, we can tackle that in another issue. -- This is an

[jira] [Updated] (HUDI-7487) Investigate flaky test in MERGE INTO

2024-03-22 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7487: - Fix Version/s: 0.15.0 > Investigate flaky test in MERGE INTO > > >

[jira] [Closed] (HUDI-7487) Investigate flaky test in MERGE INTO

2024-03-22 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7487. Resolution: Fixed Fixed via master branch: 47151f653d85202ab5b28c0e770779f05b5a59f7 > Investigate flaky

[jira] [Updated] (HUDI-7487) Investigate flaky test in MERGE INTO

2024-03-22 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7487: - Fix Version/s: 1.0.0 > Investigate flaky test in MERGE INTO > > >

(hudi) branch master updated: [HUDI-7487] Fixed test with in-memory index by proper heap clearing (#10910)

2024-03-22 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 47151f653d8 [HUDI-7487] Fixed test with

Re: [PR] [HUDI-7487] Fixed test with in-memory index by proper heap clearing [hudi]

2024-03-22 Thread via GitHub
danny0405 merged PR #10910: URL: https://github.com/apache/hudi/pull/10910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Closed] (HUDI-7529) Resolve hotspots in stream read

2024-03-22 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7529. Resolution: Fixed Fixed via master branch: 5a21a1dd260f2048909df451cbe2e9d0c9fd4ef9 > Resolve hotspots in

Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2016224918 ## CI report: * f17fb57835df837e72a1ea93fef5dce4128b8ca9 Azure:

(hudi) branch master updated (135db099afc -> 5a21a1dd260)

2024-03-22 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 135db099afc [MINOR] Remove redundant fileId from HoodieAppendHandle (#10901) add 5a21a1dd260 [HUDI-7529]

Re: [PR] [HUDI-7529] Resolve hotspots in stream read [hudi]

2024-03-22 Thread via GitHub
danny0405 merged PR #10911: URL: https://github.com/apache/hudi/pull/10911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [SUPPORT] The parquet files for the MOR table have been generated, but the RO table in Hive still cannot query the latest data in the parquet files. [hudi]

2024-03-22 Thread via GitHub
danny0405 commented on issue #10907: URL: https://github.com/apache/hudi/issues/10907#issuecomment-2016219162 Is the compaction triggered normally? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [HUDI-7466] Add tests to AWSGlueCatalogSyncClient [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10897: URL: https://github.com/apache/hudi/pull/10897#issuecomment-2016219284 ## CI report: * f17fb57835df837e72a1ea93fef5dce4128b8ca9 Azure:

Re: [PR] [HUDI-7532] Include only compaction instants for lastCompaction in getDeltaCommitsSinceLatestCompaction [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10915: URL: https://github.com/apache/hudi/pull/10915#issuecomment-2016212906 ## CI report: * acfe81fa3814c77cd04a39eb2bbddb9960bdc437 Azure:

Re: [PR] [HUDI-7532] Include only compaction instants for lastCompaction in getDeltaCommitsSinceLatestCompaction [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10915: URL: https://github.com/apache/hudi/pull/10915#issuecomment-2016112557 ## CI report: * acfe81fa3814c77cd04a39eb2bbddb9960bdc437 Azure:

Re: [PR] [HUDI-7532] Include only compaction instants for lastCompaction in getDeltaCommitsSinceLatestCompaction [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10915: URL: https://github.com/apache/hudi/pull/10915#issuecomment-2016092933 ## CI report: * acfe81fa3814c77cd04a39eb2bbddb9960bdc437 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10913: URL: https://github.com/apache/hudi/pull/10913#issuecomment-2016030909 ## CI report: * 101b295c6c344d8fe13f56752e20b531da26ea49 UNKNOWN * e162c911b8baa8b184a377244be3a555b4578862 Azure:

[PR] [HUDI-7532] Fixing schedule compaction bug [hudi]

2024-03-22 Thread via GitHub
nsivabalan opened a new pull request, #10915: URL: https://github.com/apache/hudi/pull/10915 ### Change Logs Fixing schedule compaction bug. ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none,

[jira] [Updated] (HUDI-7532) Fix schedule compact to only consider DCs after last compaction commit

2024-03-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7532: - Labels: pull-request-available (was: ) > Fix schedule compact to only consider DCs after last

[jira] [Created] (HUDI-7532) Fix schedule compact to only consider DCs after last compaction commit

2024-03-22 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-7532: - Summary: Fix schedule compact to only consider DCs after last compaction commit Key: HUDI-7532 URL: https://issues.apache.org/jira/browse/HUDI-7532

[I] [SUPPORT] Flink-Hudi - Upsert into the same Hudi table via two different Flink pipelines (stream and batch) [hudi]

2024-03-22 Thread via GitHub
ChiehFu opened a new issue, #10914: URL: https://github.com/apache/hudi/issues/10914 **Describe the problem you faced** Hi, My team wants to build Flink pipelines to generate financial report and save the report results into a Hudi COW table. The data sources for the

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536245569 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner

Re: [PR] [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10913: URL: https://github.com/apache/hudi/pull/10913#issuecomment-2015960700 ## CI report: * 101b295c6c344d8fe13f56752e20b531da26ea49 UNKNOWN * e162c911b8baa8b184a377244be3a555b4578862 Azure:

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536245244 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -230,6 +236,10 @@ protected

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
vinothchandar commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1536245082 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends

Re: [PR] [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10913: URL: https://github.com/apache/hudi/pull/10913#issuecomment-2015952928 ## CI report: * 101b295c6c344d8fe13f56752e20b531da26ea49 UNKNOWN * e162c911b8baa8b184a377244be3a555b4578862 UNKNOWN Bot commands @hudi-bot supports the

Re: [PR] [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10913: URL: https://github.com/apache/hudi/pull/10913#issuecomment-2015942381 ## CI report: * 101b295c6c344d8fe13f56752e20b531da26ea49 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

[jira] [Updated] (HUDI-7518) Fix HoodieMetadataPayload merging logic around repeated deletes

2024-03-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7518: - Labels: pull-request-available (was: ) > Fix HoodieMetadataPayload merging logic around repeated

[PR] [HUDI-7518] Fix HoodieMetadataPayload merging logic around repeated deletes [hudi]

2024-03-22 Thread via GitHub
yihua opened a new pull request, #10913: URL: https://github.com/apache/hudi/pull/10913 ### Change Logs When there are repeated duplicate deletes to the partition file list in `files` partition of the MDT, the current HoodieMetadataPayload merging logic drops such "deletion",

Re: [I] [SUPPORT] Requesting Support for insert_overwrite in Delta Streamer [hudi]

2024-03-22 Thread via GitHub
soumilshah1995 commented on issue #10896: URL: https://github.com/apache/hudi/issues/10896#issuecomment-2015740027 its here https://github.com/soumilshah1995/DeltaHudiTransformations -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (HUDI-7531) Consider pending clustering when scheduling a new clustering plan

2024-03-22 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7531: Description: See [https://github.com/apache/hudi/pull/9755#discussion_r1535961867]   > Consider pending

[jira] [Updated] (HUDI-7531) Consider pending clustering when scheduling a new clustering plan

2024-03-22 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7531: Fix Version/s: 0.15.0 1.0.0 > Consider pending clustering when scheduling a new

[jira] [Created] (HUDI-7531) Consider pending clustering when scheduling a new clustering plan

2024-03-22 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-7531: --- Summary: Consider pending clustering when scheduling a new clustering plan Key: HUDI-7531 URL: https://issues.apache.org/jira/browse/HUDI-7531 Project: Apache Hudi

[jira] [Updated] (HUDI-7531) Consider pending clustering when scheduling a new clustering plan

2024-03-22 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7531: Priority: Blocker (was: Major) > Consider pending clustering when scheduling a new clustering plan >

[jira] [Assigned] (HUDI-7531) Consider pending clustering when scheduling a new clustering plan

2024-03-22 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7531: --- Assignee: Ethan Guo > Consider pending clustering when scheduling a new clustering plan >

Re: [PR] [HUDI-6882] Differentiate between replacecommits in cluster planning [hudi]

2024-03-22 Thread via GitHub
yihua commented on code in PR #9755: URL: https://github.com/apache/hudi/pull/9755#discussion_r1535961867 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java: ## @@ -490,6 +493,19 @@ public Option getFirstNonSavepointCommit() { }

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015437643 ## CI report: * 2c83cfaf2bdaef6b5075989992aeeff8052461ed UNKNOWN * a84507191a942c5d8c98610958ca48f47188bc48 Azure:

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015350367 ## CI report: * b802619f011c1d9ef5b334ecf67ab7df74964e08 Azure:

Re: [PR] [HUDI-7504] replace expensive existence check with spark options [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10865: URL: https://github.com/apache/hudi/pull/10865#discussion_r1535773675 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java: ## @@ -112,10 +110,15 @@ public S3EventsHoodieIncrSource(

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015337228 ## CI report: * b802619f011c1d9ef5b334ecf67ab7df74964e08 Azure:

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on PR #10876: URL: https://github.com/apache/hudi/pull/10876#issuecomment-2015327684 > IIUC this adds additional shuffle and a new job? I'd like to understand how we think this impacts the current insert DAG. Yet to review the new partitioner, will do once I hear back

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535756962 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535754607 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535751978 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535749601 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -411,4 +427,90 @@ public Partitioner

Re: [PR] [HUDI-7517] Add ability to reset the checkpoint for kafka source [hudi]

2024-03-22 Thread via GitHub
nsivabalan commented on PR #10890: URL: https://github.com/apache/hudi/pull/10890#issuecomment-2015304585 I get it now, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535706564 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -394,6 +404,12 @@ public Partitioner

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535704794 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java: ## @@ -230,6 +236,10 @@ protected Partitioner

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535694790 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535695254 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10908: URL: https://github.com/apache/hudi/pull/10908#issuecomment-2015205362 ## CI report: * 235e61f59980bc67f9069ef6ef42987dcb3e8da8 Azure:

Re: [PR] [HUDI-7504] replace expensive existence check with spark options [hudi]

2024-03-22 Thread via GitHub
yihua commented on code in PR #10865: URL: https://github.com/apache/hudi/pull/10865#discussion_r1535633958 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java: ## @@ -112,10 +110,15 @@ public S3EventsHoodieIncrSource(

Re: [I] [SUPPORT] Duplicate data in base file of MOR table [hudi]

2024-03-22 Thread via GitHub
ad1happy2go commented on issue #10882: URL: https://github.com/apache/hudi/issues/10882#issuecomment-2015093418 @wqwl611 I tried the same configuration but unable to reproduce. As you also mentioned that you are also getting this in very few cases. So we need to verify this on your env. In

Re: [I] [SUPPORT] Requesting Support for insert_overwrite in Delta Streamer [hudi]

2024-03-22 Thread via GitHub
ad1happy2go commented on issue #10896: URL: https://github.com/apache/hudi/issues/10896#issuecomment-2015088337 Thanks @soumilshah1995 for the details. can you share the full Hudi Streamer command/code which you were using? -- This is an automated message from the Apache Git Service. To

Re: [PR] [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10912: URL: https://github.com/apache/hudi/pull/10912#issuecomment-2014993176 ## CI report: * acd11d1d5147ab43e2a17724ea8441d6529ab4c8 Azure:

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10908: URL: https://github.com/apache/hudi/pull/10908#issuecomment-2014971979 ## CI report: * 45cc0dc16d5fa21c5a85e08435abc719a7a66947 Azure:

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10908: URL: https://github.com/apache/hudi/pull/10908#issuecomment-2014960658 ## CI report: * 45cc0dc16d5fa21c5a85e08435abc719a7a66947 Azure:

Re: [PR] [HUDI-7512] sort input records for insert operation [hudi]

2024-03-22 Thread via GitHub
bhat-vinay commented on code in PR #10876: URL: https://github.com/apache/hudi/pull/10876#discussion_r1535473655 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -480,6 +480,20 @@ public class HoodieWriteConfig extends

(hudi) branch asf-site updated: [DOCS] Add more users in the powered-by page (#10854)

2024-03-22 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 13a76bcf4b0 [DOCS] Add more users in

Re: [PR] [DOCS] Add more users in the powered-by page [hudi]

2024-03-22 Thread via GitHub
bhasudha merged PR #10854: URL: https://github.com/apache/hudi/pull/10854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7487] Fixed test with in-memory index by proper heap clearing [hudi]

2024-03-22 Thread via GitHub
geserdugarov commented on PR #10910: URL: https://github.com/apache/hudi/pull/10910#issuecomment-2014912363 @yihua , hi. Could you, please, check the test fix you previously disabled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] must specify a primary key when creating a hudi table [hudi]

2024-03-22 Thread via GitHub
codope closed issue #10774: must specify a primary key when creating a hudi table URL: https://github.com/apache/hudi/issues/10774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] must specify a primary key when creating a hudi table [hudi]

2024-03-22 Thread via GitHub
ad1happy2go commented on issue #10774: URL: https://github.com/apache/hudi/issues/10774#issuecomment-2014889203 @qinlz-1 Closing this issue. Please reopen in case of any other issues. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10912: URL: https://github.com/apache/hudi/pull/10912#issuecomment-2014884445 ## CI report: * acd11d1d5147ab43e2a17724ea8441d6529ab4c8 Azure:

Re: [PR] [HUDI-7487] Fixed test with in-memory index by proper heap clearing [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10910: URL: https://github.com/apache/hudi/pull/10910#issuecomment-2014884381 ## CI report: * 74fed271c28a2e4bbb804b7643e0d9d91448f5ac Azure:

Re: [PR] [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10912: URL: https://github.com/apache/hudi/pull/10912#issuecomment-2014874257 ## CI report: * acd11d1d5147ab43e2a17724ea8441d6529ab4c8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10908: URL: https://github.com/apache/hudi/pull/10908#issuecomment-2014863256 ## CI report: * 45cc0dc16d5fa21c5a85e08435abc719a7a66947 Azure:

[PR] [MINOR] Refactored `@Before*` and `@After*` in `HoodieDeltaStreamerTestBase` [hudi]

2024-03-22 Thread via GitHub
geserdugarov opened a new pull request, #10912: URL: https://github.com/apache/hudi/pull/10912 ### Change Logs We have two `@BeforeEach` in `HoodieDeltaStreamerTestBase` without any guarantees of execution ordering. The reason is the section with all `@Before*` and `@After*` is

Re: [I] Nested object support in Hudi Table using Flink [hudi]

2024-03-22 Thread via GitHub
waytoharish commented on issue #10895: URL: https://github.com/apache/hudi/issues/10895#issuecomment-2014848588 Thanks for your time @ad1happy2go @ad1happy2go @danny0405 Here is the error which I am getting after the code change : 14:38:06,411 INFO

Re: [I] Nested object support in Hudi Table using Flink [hudi]

2024-03-22 Thread via GitHub
ad1happy2go commented on issue #10895: URL: https://github.com/apache/hudi/issues/10895#issuecomment-2014803349 @waytoharish As discussed on call, can you provide latest code and exception what we were getting. -- This is an automated message from the Apache Git Service. To respond to

Re: [I] [BUG] Failure Encountered When Reading Hudi with Flink in Batch Runtime Mode and FlinkOptions.READ_AS_STREAMING=false [hudi]

2024-03-22 Thread via GitHub
ad1happy2go commented on issue #10576: URL: https://github.com/apache/hudi/issues/10576#issuecomment-2014799423 @ailinzhou Can you provide script/commands what you are using? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10908: URL: https://github.com/apache/hudi/pull/10908#issuecomment-2014792016 ## CI report: * 45cc0dc16d5fa21c5a85e08435abc719a7a66947 Azure:

Re: [PR] [HUDI-7530] Refactoring of handleUpdateInternal in CommitActionExecutors and HoodieTables [hudi]

2024-03-22 Thread via GitHub
wombatu-kun commented on PR #10908: URL: https://github.com/apache/hudi/pull/10908#issuecomment-2014781292 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [HUDI-7529] Resolve hotspots in stream read [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10911: URL: https://github.com/apache/hudi/pull/10911#issuecomment-2014780735 ## CI report: * de737208bc927a012e4510d5d9667b8fcf594210 Azure:

Re: [PR] [HUDI-7487] Fixed test with in-memory index by proper heap clearing [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10910: URL: https://github.com/apache/hudi/pull/10910#issuecomment-2014766484 ## CI report: * 0b8d7409dca663b4d4ff7b8ff50fa0c5aa583aaa Azure:

Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014766403 ## CI report: * b5ebcf8de8abc367918e5ab570be4bcd52b33208 Azure:

Re: [I] [BUG] Failure Encountered When Reading Hudi with Flink in Batch Runtime Mode and FlinkOptions.READ_AS_STREAMING=false [hudi]

2024-03-22 Thread via GitHub
ailinzhou commented on issue #10576: URL: https://github.com/apache/hudi/issues/10576#issuecomment-2014700220 > @ailinzhou Are you still facing this issue? Yes, unfortunately, I'm still experiencing the problem. -- This is an automated message from the Apache Git Service. To

Re: [PR] [HUDI-7528] Fixing RowCustomColumnsSortPartitioner to use repartition instead of coalesce [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10909: URL: https://github.com/apache/hudi/pull/10909#issuecomment-2014693276 ## CI report: * 5f6135593aab6329b060ea1ee30388eb22c0dc97 Azure:

Re: [PR] [HUDI-7487] Fixed test with in-memory index by proper heap clearing [hudi]

2024-03-22 Thread via GitHub
hudi-bot commented on PR #10910: URL: https://github.com/apache/hudi/pull/10910#issuecomment-2014693385 ## CI report: * 0b8d7409dca663b4d4ff7b8ff50fa0c5aa583aaa Azure:

  1   2   >