[hudi] branch dependabot/maven/xalan-xalan-2.7.3 created (now 6878ff14acf)

2023-05-05 Thread github-bot
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to branch dependabot/maven/xalan-xalan-2.7.3 in repository https://gitbox.apache.org/repos/asf/hudi.git at 6878ff14acf Bump xalan from 2.7.2 to 2.7.3 No new revisions were added by this update.

[GitHub] [hudi] dependabot[bot] opened a new pull request, #8650: Bump xalan from 2.7.2 to 2.7.3

2023-05-05 Thread via GitHub
dependabot[bot] opened a new pull request, #8650: URL: https://github.com/apache/hudi/pull/8650 Bumps xalan from 2.7.2 to 2.7.3. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=xalan:xalan&package-manager=maven&

[jira] [Updated] (HUDI-6027) Unnecessary scala-maven-plugin causes build issue with JDK17

2023-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6027: - Labels: pull-request-available (was: ) > Unnecessary scala-maven-plugin causes build issue with J

[hudi] branch master updated (f2f0a316627 -> ad1b1474418)

2023-05-05 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from f2f0a316627 [HUDI-6179] Add description to the DeltaStreamer config group (#8639) add ad1b1474418 [HUDI-6027] Remov

[GitHub] [hudi] yihua merged pull request #8336: [HUDI-6027] Remove unnecessary scala-maven-plugin

2023-05-05 Thread via GitHub
yihua merged PR #8336: URL: https://github.com/apache/hudi/pull/8336 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[hudi] branch master updated (3dcd7573fa2 -> f2f0a316627)

2023-05-05 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 3dcd7573fa2 [HUDI-6184] Improve the test on incremental queries (#8648) add f2f0a316627 [HUDI-6179] Add description

[GitHub] [hudi] yihua merged pull request #8639: [HUDI-6179] Add description to the DeltaStreamer config group

2023-05-05 Thread via GitHub
yihua merged PR #8639: URL: https://github.com/apache/hudi/pull/8639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[GitHub] [hudi] yihua commented on pull request #8639: [HUDI-6179] Add description to the DeltaStreamer config group

2023-05-05 Thread via GitHub
yihua commented on PR #8639: URL: https://github.com/apache/hudi/pull/8639#issuecomment-1537069996 CI failures are due to flaky tests. Merging this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] big-doudou commented on issue #8647: [SUPPORT] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
big-doudou commented on issue #8647: URL: https://github.com/apache/hudi/issues/8647#issuecomment-1537067764 > Thanks, I have fired a fix: #8649 I want to submit this pr and get Contributor but you are too fast -- This is an automated message from the Apache Git Service. To respo

[GitHub] [hudi] hudi-bot commented on pull request #8649: [HUDI-6185] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8649: URL: https://github.com/apache/hudi/pull/8649#issuecomment-1537064752 ## CI report: * 3fbafb674339bb1e296fe8827996a84cf16fff19 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=168

[jira] [Updated] (HUDI-6174) Fix flaky test testCleanerDeleteReplacedDataWithArchive

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6174: Status: In Progress (was: Open) > Fix flaky test testCleanerDeleteReplacedDataWithArchive > ---

[jira] [Updated] (HUDI-6174) Fix flaky test testCleanerDeleteReplacedDataWithArchive

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6174: Fix Version/s: 0.13.1 (was: 0.14.0) > Fix flaky test testCleanerDeleteReplacedDataWit

[GitHub] [hudi] yihua merged pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
yihua merged PR #8648: URL: https://github.com/apache/hudi/pull/8648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

[hudi] branch master updated: [HUDI-6184] Improve the test on incremental queries (#8648)

2023-05-05 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 3dcd7573fa2 [HUDI-6184] Improve the test on increme

[GitHub] [hudi] yihua commented on pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
yihua commented on PR #8648: URL: https://github.com/apache/hudi/pull/8648#issuecomment-1537064074 The CI failure is due to other failed tests. Merging this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] hudi-bot commented on pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8648: URL: https://github.com/apache/hudi/pull/8648#issuecomment-1537063288 ## CI report: * 95f0f61b8a46f221c9276facf41ef266b43dd062 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] hudi-bot commented on pull request #8649: [HUDI-6185] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8649: URL: https://github.com/apache/hudi/pull/8649#issuecomment-1537063301 ## CI report: * 3fbafb674339bb1e296fe8827996a84cf16fff19 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1688

[GitHub] [hudi] zhangyue19921010 commented on pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
zhangyue19921010 commented on PR #8648: URL: https://github.com/apache/hudi/pull/8648#issuecomment-1537062164 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] hudi-bot commented on pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8648: URL: https://github.com/apache/hudi/pull/8648#issuecomment-1537061984 ## CI report: * 95f0f61b8a46f221c9276facf41ef266b43dd062 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] danny0405 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-05-05 Thread via GitHub
danny0405 commented on code in PR #8505: URL: https://github.com/apache/hudi/pull/8505#discussion_r1186641929 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java: ## @@ -295,4 +301,11 @@ private String getSchemaFromLatestInstant() throws Exception {

[GitHub] [hudi] danny0405 commented on a diff in pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-05-05 Thread via GitHub
danny0405 commented on code in PR #8505: URL: https://github.com/apache/hudi/pull/8505#discussion_r1186641821 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieCompactor.java: ## @@ -112,6 +112,9 @@ public static class Config implements Serializable { spli

[GitHub] [hudi] danny0405 closed issue #8647: [SUPPORT] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
danny0405 closed issue #8647: [SUPPORT] Too many logs in the ExternalSpillableMap URL: https://github.com/apache/hudi/issues/8647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] danny0405 commented on a diff in pull request #8649: [HUDI-6185] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
danny0405 commented on code in PR #8649: URL: https://github.com/apache/hudi/pull/8649#discussion_r1186640829 ## hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java: ## @@ -202,28 +202,20 @@ public R get(Object key) { @Override publ

[GitHub] [hudi] hudi-bot commented on pull request #8649: [HUDI-6185] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8649: URL: https://github.com/apache/hudi/pull/8649#issuecomment-1537054542 ## CI report: * 3fbafb674339bb1e296fe8827996a84cf16fff19 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1688

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1537054457 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * d8898781229ccb0359032b784b18be5d257a0ede Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] hudi-bot commented on pull request #8649: [HUDI-6185] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8649: URL: https://github.com/apache/hudi/pull/8649#issuecomment-1537053060 ## CI report: * 3fbafb674339bb1e296fe8827996a84cf16fff19 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8503: [HUDI-6047] Clustering operation on consistent hashing index resulting in duplicate data

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8503: URL: https://github.com/apache/hudi/pull/8503#issuecomment-1537052968 ## CI report: * 0738d975df341763e384b9ac9bcad14b006c9c47 UNKNOWN * d8898781229ccb0359032b784b18be5d257a0ede Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1537050542 ## CI report: * 6526a12287cc85865da640d23a9266d887e82eba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] jfrylings-twilio commented on issue #8325: [SUPPORT] spark read hudi error: Unable to instantiate HFileBootstrapIndex

2023-05-05 Thread via GitHub
jfrylings-twilio commented on issue #8325: URL: https://github.com/apache/hudi/issues/8325#issuecomment-1537044024 Setting `fs.s3a.connection.maximum=200` fixed the issue for me. The issue looks like it was caused by recent input touching more partitions than usual. -- This is an automat

[GitHub] [hudi] danny0405 commented on issue #8647: [SUPPORT] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
danny0405 commented on issue #8647: URL: https://github.com/apache/hudi/issues/8647#issuecomment-1537043808 Thanks, I have fired a fix: https://github.com/apache/hudi/pull/8649 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[jira] [Updated] (HUDI-6185) Too many logs in the ExternalSpillableMap

2023-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6185: - Labels: pull-request-available (was: ) > Too many logs in the ExternalSpillableMap >

[GitHub] [hudi] danny0405 opened a new pull request, #8649: [HUDI-6185] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
danny0405 opened a new pull request, #8649: URL: https://github.com/apache/hudi/pull/8649 ### Change Logs Introduced by https://github.com/apache/hudi/pull/6632. The current code logs for each record when the map had been spilled. It caused deluge of loggings. ### Impact

[GitHub] [hudi] hudi-bot commented on pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8648: URL: https://github.com/apache/hudi/pull/8648#issuecomment-1537041933 ## CI report: * 95f0f61b8a46f221c9276facf41ef266b43dd062 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] hudi-bot commented on pull request #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8648: URL: https://github.com/apache/hudi/pull/8648#issuecomment-1537040386 ## CI report: * 95f0f61b8a46f221c9276facf41ef266b43dd062 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Created] (HUDI-6185) Too many logs in the ExternalSpillableMap

2023-05-05 Thread Danny Chen (Jira)
Danny Chen created HUDI-6185: Summary: Too many logs in the ExternalSpillableMap Key: HUDI-6185 URL: https://issues.apache.org/jira/browse/HUDI-6185 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] danny0405 commented on pull request #8529: [HUDI-6120] Filter base file when there is only one file slice fetched

2023-05-05 Thread via GitHub
danny0405 commented on PR #8529: URL: https://github.com/apache/hudi/pull/8529#issuecomment-1537036707 > @danny0405 Hi danny, I've added the notion as you said, should I revert the changed code and the unit test? I don't think the change is necessary, maybe we can just left the docum

[jira] [Updated] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6184: - Labels: pull-request-available (was: ) > Improve the test on incremental queries > --

[GitHub] [hudi] yihua opened a new pull request, #8648: [HUDI-6184] Improve the test on incremental queries

2023-05-05 Thread via GitHub
yihua opened a new pull request, #8648: URL: https://github.com/apache/hudi/pull/8648 ### Change Logs The test `TestIncrementalReadWithFullTableScan#testFailEarlyForIncrViewQueryForNonExistingFiles` can fail due to changes in archival behavior because of hard-coded parameters. This

[jira] [Updated] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6184: Description: The test `TestIncrementalReadWithFullTableScan#testFailEarlyForIncrViewQueryForNonExistingFiles

[jira] [Created] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6184: --- Summary: Improve the test on incremental queries Key: HUDI-6184 URL: https://issues.apache.org/jira/browse/HUDI-6184 Project: Apache Hudi Issue Type: Improvement

[jira] [Updated] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6184: Fix Version/s: 0.13.1 > Improve the test on incremental queries > --- >

[jira] [Updated] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6184: Epic Link: HUDI-4302 Story Points: 1 Priority: Critical (was: Major) > Improve the test on i

[jira] [Updated] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6184: Component/s: tests-ci > Improve the test on incremental queries > --- >

[jira] [Assigned] (HUDI-6184) Improve the test on incremental queries

2023-05-05 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6184: --- Assignee: Ethan Guo > Improve the test on incremental queries > -

[GitHub] [hudi] big-doudou commented on issue #8647: [SUPPORT] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
big-doudou commented on issue #8647: URL: https://github.com/apache/hudi/issues/8647#issuecomment-1537030089 https://github.com/apache/hudi/blob/83d4fe15b4f50a58e919390217b71bb95e952085/hudi-common/src/main/java/org/apache/hudi/common/util/collection/ExternalSpillableMap.java#L209 sho

[GitHub] [hudi] big-doudou opened a new issue, #8647: [SUPPORT] Too many logs in the ExternalSpillableMap

2023-05-05 Thread via GitHub
big-doudou opened a new issue, #8647: URL: https://github.com/apache/hudi/issues/8647 **Describe the problem you faced** I am using flink sink hudi. As the picture shows: A line of estimated size log will be printed every time 100 pieces of data are processed. This will cause other

[GitHub] [hudi] waitingF commented on a diff in pull request #8378: [HUDI-6031] fix bug: checkpoint lost after changing cow to mor

2023-05-05 Thread via GitHub
waitingF commented on code in PR #8378: URL: https://github.com/apache/hudi/pull/8378#discussion_r1186596557 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java: ## @@ -320,11 +320,9 @@ public void refreshTimeline() throws IOException {

[GitHub] [hudi] danny0405 commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-05-05 Thread via GitHub
danny0405 commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1536989748 > @xiarixiaoyao > > Atleast from the java ci perspective when disabling this vectorized reader config for spark 3.3.2, all the tests are passing. > > cc @danny0405 @yihua

[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1536985260 ## CI report: * 6526a12287cc85865da640d23a9266d887e82eba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] boneanxs commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-05 Thread via GitHub
boneanxs commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1536984114 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8526: [HUDI-6116] Optimize log block reading by removing seeks to check corrupted blocks.

2023-05-05 Thread via GitHub
nsivabalan commented on code in PR #8526: URL: https://github.com/apache/hudi/pull/8526#discussion_r1186576344 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java: ## @@ -152,98 +153,107 @@ private void addShutDownHook() { // TODO : convert

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1536956006 ## CI report: * b5b72b6043cc2a110d175b8f2b69b9e38902359d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8609: [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates.

2023-05-05 Thread via GitHub
nsivabalan commented on code in PR #8609: URL: https://github.com/apache/hudi/pull/8609#discussion_r1186572588 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -323,22 +326,43 @@ public HoodieTableConfig() { super(); } - private

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8605: [HUDI-6152] Fixed the check for older timestamps with second granularity during index tagLocation.

2023-05-05 Thread via GitHub
nsivabalan commented on code in PR #8605: URL: https://github.com/apache/hudi/pull/8605#discussion_r1186572034 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieIndex.java: ## @@ -587,6 +594,43 @@ public void testSimpleGlobalIndexTagLoca

[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536940199 > @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed (boo

[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536939863 > this interesting optimization this interesting optimization was reported by @nsivabalan and has not been implemented for a long time -- This is an automated message from the Apac

[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on code in PR #8472: URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567773 ## hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on code in PR #8472: URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567773 ## hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on code in PR #8472: URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567052 ## hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on code in PR #8472: URL: https://github.com/apache/hudi/pull/8472#discussion_r1186567008 ## hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [hudi] clownxc commented on a diff in pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on code in PR #8472: URL: https://github.com/apache/hudi/pull/8472#discussion_r1186566954 ## hudi-common/src/main/java/org/apache/hudi/common/model/IndexItem.java: ## @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or mor

[GitHub] [hudi] clownxc commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
clownxc commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536932897 > @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members -

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1536894048 ## CI report: * ce3d13748cd488df7e055392f2e9db4ac2bfc18b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] hudi-bot commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1536886929 ## CI report: * ce3d13748cd488df7e055392f2e9db4ac2bfc18b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] nsivabalan commented on pull request #8107: [HUDI-5514][HUDI-5574][HUDI-5604][HUDI-5535] Adding auto generation of record keys support to Hudi/Spark

2023-05-05 Thread via GitHub
nsivabalan commented on PR #8107: URL: https://github.com/apache/hudi/pull/8107#issuecomment-1536870362 @danny0405 : all feedback has been addressed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1536831815 ## CI report: * 992fe5cf02f6e79bb3a153ec6dd5a8607079095c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[hudi] branch master updated: [HUDI-6168] Add ability to parse partition value into row for S3 and GCS sources (#8629)

2023-05-05 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 24557703869 [HUDI-6168] Add ability to parse pa

[GitHub] [hudi] nsivabalan commented on pull request #8629: [HUDI-6168] Add ability to parse partition value into row for S3 and GCS sources

2023-05-05 Thread via GitHub
nsivabalan commented on PR #8629: URL: https://github.com/apache/hudi/pull/8629#issuecomment-1536822131 going ahead w/ landing. the test failure is a known flaky one. https://github.com/apache/hudi/pull/8646 -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [hudi] nsivabalan merged pull request #8629: [HUDI-6168] Add ability to parse partition value into row for S3 and GCS sources

2023-05-05 Thread via GitHub
nsivabalan merged PR #8629: URL: https://github.com/apache/hudi/pull/8629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

[GitHub] [hudi] yihua commented on a diff in pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-05 Thread via GitHub
yihua commented on code in PR #8646: URL: https://github.com/apache/hudi/pull/8646#discussion_r1186505895 ## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestHoodieDeltaStreamer.java: ## @@ -1109,15 +1109,31 @@ public void testCleanerDeleteReplacedDataWi

[GitHub] [hudi] hudi-bot commented on pull request #8643: [HUDI-6180] Use ConfigProperty for Timestamp keygen configs

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8643: URL: https://github.com/apache/hudi/pull/8643#issuecomment-1536778429 ## CI report: * e31f461490c8f02fbda86c7308c57090cd930d37 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] nfarah86 commented on issue #8625: [SUPPORT] Hudi partial updates not working with JSON inferred dataframe

2023-05-05 Thread via GitHub
nfarah86 commented on issue #8625: URL: https://github.com/apache/hudi/issues/8625#issuecomment-1536766582 emr 6 is on 0.12.2- https://user-images.githubusercontent.com/5392555/236566359-0dd72465-644b-4b9d-9ca3-ec1df24616e8.png";> -- This is an automated message from the Apache Git

[GitHub] [hudi] nfarah86 commented on issue #8625: [SUPPORT] Hudi partial updates not working with JSON inferred dataframe

2023-05-05 Thread via GitHub
nfarah86 commented on issue #8625: URL: https://github.com/apache/hudi/issues/8625#issuecomment-1536764700 partial updates I thought were supported in 0.13.0? https://issues.apache.org/jira/browse/HUDI-3304 -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1536735529 ## CI report: * 992fe5cf02f6e79bb3a153ec6dd5a8607079095c Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] yihua commented on a diff in pull request #8378: [HUDI-6031] fix bug: checkpoint lost after changing cow to mor

2023-05-05 Thread via GitHub
yihua commented on code in PR #8378: URL: https://github.com/apache/hudi/pull/8378#discussion_r1186469663 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java: ## @@ -320,11 +320,9 @@ public void refreshTimeline() throws IOException {

[GitHub] [hudi] hudi-bot commented on pull request #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8646: URL: https://github.com/apache/hudi/pull/8646#issuecomment-1536726037 ## CI report: * 992fe5cf02f6e79bb3a153ec6dd5a8607079095c UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8641: [HUDI-5980] Add tests to guard against repeated dag trigger using spark event listeners

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8641: URL: https://github.com/apache/hudi/pull/8641#issuecomment-1536718071 ## CI report: * a10d43668a09f81c0754f3f080ca56486d00974a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[jira] [Updated] (HUDI-6174) Fix flaky test testCleanerDeleteReplacedDataWithArchive

2023-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6174: - Labels: pull-request-available (was: ) > Fix flaky test testCleanerDeleteReplacedDataWithArchive

[GitHub] [hudi] nsivabalan opened a new pull request, #8646: [HUDI-6174] Fixing flaky tests in HoodieDeltastreamer

2023-05-05 Thread via GitHub
nsivabalan opened a new pull request, #8646: URL: https://github.com/apache/hudi/pull/8646 ### Change Logs Fixing flaky cleaner and replace commit tests in TestHoodieDeltastreamer ### Impact Fixing flaky cleaner and replace commit tests in TestHoodieDeltastreamer #

[GitHub] [hudi] nsivabalan commented on pull request #8641: [HUDI-5980] Add tests to guard against repeated dag trigger using spark event listeners

2023-05-05 Thread via GitHub
nsivabalan commented on PR #8641: URL: https://github.com/apache/hudi/pull/8641#issuecomment-1536704873 Please fix the PR description w/ proper details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] amrishlal opened a new pull request, #8645: [WIP] Table size stats uility.

2023-05-05 Thread via GitHub
amrishlal opened a new pull request, #8645: URL: https://github.com/apache/hudi/pull/8645 ### Change Logs Calculate and output file size stats of data files that were modified in the half-open interval [start date (--start-date parameter), end date (--end-date parameter)). --num-days

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8641: [HUDI-5980] Add tests to guard against repeated dag trigger using spark event listeners

2023-05-05 Thread via GitHub
nsivabalan commented on code in PR #8641: URL: https://github.com/apache/hudi/pull/8641#discussion_r1186449125 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestDagExecutionDataSource.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Soft

[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536673539 ## CI report: * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN * c69a04b7c23d381b6a4fe16c1fb016f8e1363794 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] d4r3topk opened a new issue, #8644: [SUPPORT] Data loss while ingesting multiple hudi tables via one glue/spark job with clustering and metadata properties

2023-05-05 Thread via GitHub
d4r3topk opened a new issue, #8644: URL: https://github.com/apache/hudi/issues/8644 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr.

[GitHub] [hudi] hudi-bot commented on pull request #8643: [HUDI-6180] Use ConfigProperty for Timestamp keygen configs

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8643: URL: https://github.com/apache/hudi/pull/8643#issuecomment-1536607746 ## CI report: * e31f461490c8f02fbda86c7308c57090cd930d37 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1687

[GitHub] [hudi] hudi-bot commented on pull request #8505: [HUDI-6106] Spark offline compaction/Clustering Job will do clean like Flink job

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8505: URL: https://github.com/apache/hudi/pull/8505#issuecomment-1536607278 ## CI report: * f7c73e83812258b53b979afbd6d465e9066b801f UNKNOWN * 269fad02a5346121e823a15c9804e2e63eb16c30 UNKNOWN * 442430f680316bdfefc27c4aca9f7cd94e95373c UNKNOWN * e6

[jira] [Updated] (HUDI-6180) Use ConfigProperty for Timestamp keygen configs

2023-05-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6180: - Labels: pull-request-available (was: ) > Use ConfigProperty for Timestamp keygen configs > --

[GitHub] [hudi] hudi-bot commented on pull request #8643: [HUDI-6180] Use ConfigProperty for Timestamp keygen configs

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8643: URL: https://github.com/apache/hudi/pull/8643#issuecomment-1536600209 ## CI report: * e31f461490c8f02fbda86c7308c57090cd930d37 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] prashantwason commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
prashantwason commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536586226 @clownxc If I understand correctly, the memory savings are coming from dropping the "data" part of the HoodieRecord? I noticed that HoodieRecord has only 2 additional members - sealed

[GitHub] [hudi] yihua opened a new pull request, #8643: [HUDI-6180] Use ConfigProperty for Timestamp keygen configs

2023-05-05 Thread via GitHub
yihua opened a new pull request, #8643: URL: https://github.com/apache/hudi/pull/8643 ### Change Logs This PR refactors the configs for the timestamp-based key generator (`TimestampBasedKeyGenerator`) to use `ConfigProperty` so that these configs will show up in the `Configurations`

[GitHub] [hudi] prashantwason commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
prashantwason commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536572654 > @prashantwason @nbalajee @suryaprasanna would this break you all in anyway? Do we need the record data anywhere for successful writes? record index implementation requires the

[GitHub] [hudi] yihua commented on a diff in pull request #8590: [HUDI-3545] [UBER] Make HoodieAvroWriteSupport class configurable

2023-05-05 Thread via GitHub
yihua commented on code in PR #8590: URL: https://github.com/apache/hudi/pull/8590#discussion_r1186316626 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieStorageConfig.java: ## @@ -170,6 +170,12 @@ public class HoodieStorageConfig extends HoodieConfig { .

[GitHub] [hudi] rahil-c commented on pull request #8082: [HUDI-5868] Upgrade Spark to 3.3.2

2023-05-05 Thread via GitHub
rahil-c commented on PR #8082: URL: https://github.com/apache/hudi/pull/8082#issuecomment-1536481643 @xiarixiaoyao Atleast from the java ci perspective when disabling this vectorized reader config for spark 3.3.2, all the tests are passing. cc @danny0405 @yihua -- This is

[GitHub] [hudi] hudi-bot commented on pull request #8642: [MINOR] Fix some typos and delete unused parameter

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8642: URL: https://github.com/apache/hudi/pull/8642#issuecomment-1536470758 ## CI report: * 1f0ee57e9a9388a3b347b6ad4a73e764532fa7cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] hudi-bot commented on pull request #8641: [HUDI-5980] Add tests to guard against repeated dag trigger using spark event listeners

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8641: URL: https://github.com/apache/hudi/pull/8641#issuecomment-1536409520 ## CI report: * 406a97b655333cadeaa13b19905708df915692c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536408666 ## CI report: * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8641: [HUDI-5980] Add tests to guard against repeated dag trigger using spark event listeners

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8641: URL: https://github.com/apache/hudi/pull/8641#issuecomment-1536398329 ## CI report: * 406a97b655333cadeaa13b19905708df915692c6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] hudi-bot commented on pull request #8472: [HUDI-5298] Optimize WriteStatus storing HoodieRecord

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8472: URL: https://github.com/apache/hudi/pull/8472#issuecomment-1536397415 ## CI report: * ff5d944ec780dbfb0d97eea643ad12420d1cca85 UNKNOWN * 0f0011b61776e6f9a9b08481f8ad809e67e44d41 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8596: [MINOR] Use try with resource to close stream

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8596: URL: https://github.com/apache/hudi/pull/8596#issuecomment-1536385975 ## CI report: * e1816d660458d429f275ac0fd5d17f3eba5c1423 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

[GitHub] [hudi] hudi-bot commented on pull request #8452: [HUDI-6077] Add more partition push down filters

2023-05-05 Thread via GitHub
hudi-bot commented on PR #8452: URL: https://github.com/apache/hudi/pull/8452#issuecomment-1536385188 ## CI report: * 6526a12287cc85865da640d23a9266d887e82eba Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1686

  1   2   >