[GitHub] [hudi] YuweiXiao commented on pull request #6680: [HUDI-4812] Lazy fetching partition path & file slice for HoodieFileIndex

2022-11-06 Thread GitBox
YuweiXiao commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1305215450 @alexeykudinkin Hey Alex, the test is added and ut is fixed. Please take another look. By the way, I see the lazy listing mode is eager by default, is it intended? -- This is an

[GitHub] [hudi] hudi-bot commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-11-06 Thread GitBox
hudi-bot commented on PR #7139: URL: https://github.com/apache/hudi/pull/7139#issuecomment-1305214505 ## CI report: * 20c87dffbd4645d3126916c4b556ba07d905c913 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7074: [HUDI-5101] Adding spark-structured streaming test support via spark-submit job

2022-11-06 Thread GitBox
hudi-bot commented on PR #7074: URL: https://github.com/apache/hudi/pull/7074#issuecomment-1305214352 ## CI report: * 51dd40cb8d37331fb7737d707958e26ea57ac5b2 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7037: [HUDI-5078] Fixing determination of table service for metadata calls

2022-11-06 Thread GitBox
hudi-bot commented on PR #7037: URL: https://github.com/apache/hudi/pull/7037#issuecomment-1305214210 ## CI report: * b2d84de42e2818bd6b12f53c5564caa767ec2ca0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-11-06 Thread GitBox
hudi-bot commented on PR #6284: URL: https://github.com/apache/hudi/pull/6284#issuecomment-1305213429 ## CI report: * ced81ffe44bdb817d55ab972fff0517975931fb2 Azure:

[hudi] branch master updated (7a1a6837e0 -> 547a2b014e)

2022-11-06 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7a1a6837e0 [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table (#7056) add 547a2b014e

[GitHub] [hudi] xushiyan merged pull request #7112: [MINOR] Removing spark2 scala12 combinations from readme

2022-11-06 Thread GitBox
xushiyan merged PR #7112: URL: https://github.com/apache/hudi/pull/7112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Created] (HUDI-5170) Fixed the issue that the Repair Tool cannot clean log files correctly.

2022-11-06 Thread YangXuan (Jira)
YangXuan created HUDI-5170: -- Summary: Fixed the issue that the Repair Tool cannot clean log files correctly. Key: HUDI-5170 URL: https://issues.apache.org/jira/browse/HUDI-5170 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #7017: [HUDI-5066] Support flink hoodie source metaclient cache

2022-11-06 Thread GitBox
hudi-bot commented on PR #7017: URL: https://github.com/apache/hudi/pull/7017#issuecomment-1305209285 ## CI report: * 836ce4252e35c00d246ea4e1097d4ce5bdfc4a91 Azure:

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7035: [HUDI-5075] Adding support to rollback residual clustering after disabling clustering

2022-11-06 Thread GitBox
nsivabalan commented on code in PR #7035: URL: https://github.com/apache/hudi/pull/7035#discussion_r1015087999 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -1335,10 +1335,14 @@ public boolean isClusteringEnabled() {

[jira] [Assigned] (HUDI-5169) Re-attempt failed rollback (compaction and clustering) and get it to completion

2022-11-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5169: - Assignee: sivabalan narayanan > Re-attempt failed rollback (compaction and

[jira] [Updated] (HUDI-5169) Re-attempt failed rollback (compaction and clustering) and get it to completion

2022-11-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5169: -- Priority: Critical (was: Major) > Re-attempt failed rollback (compaction and

[jira] [Updated] (HUDI-5169) Re-attempt failed rollback (compaction and clustering) and get it to completion

2022-11-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5169: -- Story Points: 2 > Re-attempt failed rollback (compaction and clustering) and get it to

[jira] [Updated] (HUDI-5169) Re-attempt failed rollback (compaction and clustering) and get it to completion

2022-11-06 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5169: -- Fix Version/s: 0.12.2 > Re-attempt failed rollback (compaction and clustering) and get

[jira] [Created] (HUDI-5169) Re-attempt failed rollback (compaction and clustering) and get it to completion

2022-11-06 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5169: - Summary: Re-attempt failed rollback (compaction and clustering) and get it to completion Key: HUDI-5169 URL: https://issues.apache.org/jira/browse/HUDI-5169

[GitHub] [hudi] nsivabalan commented on pull request #7037: [HUDI-5078] Fixing determination of table service for metadata calls

2022-11-06 Thread GitBox
nsivabalan commented on PR #7037: URL: https://github.com/apache/hudi/pull/7037#issuecomment-1305198613 @xushiyan : feel free to take another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7037: [HUDI-5078] Fixing determination of table service for metadata calls

2022-11-06 Thread GitBox
nsivabalan commented on code in PR #7037: URL: https://github.com/apache/hudi/pull/7037#discussion_r1015084533 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/BaseActionExecutor.java: ## @@ -58,7 +59,7 @@ public

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7037: [HUDI-5078] Fixing determination of table service for metadata calls

2022-11-06 Thread GitBox
nsivabalan commented on code in PR #7037: URL: https://github.com/apache/hudi/pull/7037#discussion_r1015081511 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -656,7 +659,9 @@ private List

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7038: [HUDI-5079] Optimizing rdd.isEmpty calls in DeltaSync

2022-11-06 Thread GitBox
nsivabalan commented on code in PR #7038: URL: https://github.com/apache/hudi/pull/7038#discussion_r1015079871 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java: ## @@ -499,7 +499,7 @@ private Pair>> fetchFromSourc return new

[GitHub] [hudi] nsivabalan commented on pull request #7074: [HUDI-5101] Adding spark-structured streaming test support via spark-submit job

2022-11-06 Thread GitBox
nsivabalan commented on PR #7074: URL: https://github.com/apache/hudi/pull/7074#issuecomment-1305189512 addressed comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] nsivabalan commented on pull request #7111: [DOCS][MINOR] Adding useful maven commands to website

2022-11-06 Thread GitBox
nsivabalan commented on PR #7111: URL: https://github.com/apache/hudi/pull/7111#issuecomment-1305185470 @codope : feel free to land. addressed all comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[hudi] branch asf-site updated: [DOCS] Add faq - how to reduce table versions created by hudi in metastore (#7108)

2022-11-06 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new e8d7999054 [DOCS] Add faq - how to reduce

[GitHub] [hudi] nsivabalan merged pull request #7108: [DOCS] add FAQ on how to reduce table versions created in metastore by hudi

2022-11-06 Thread GitBox
nsivabalan merged PR #7108: URL: https://github.com/apache/hudi/pull/7108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] nsivabalan commented on pull request #7082: Turn metadata table enabled for readers

2022-11-06 Thread GitBox
nsivabalan commented on PR #7082: URL: https://github.com/apache/hudi/pull/7082#issuecomment-1305159908 We have last few instability issues we are looking to fix wrt metadata table.may be once we iron those out, we can look to enable it on the reader side. for now, would prefer to defer

[GitHub] [hudi] hudi-bot commented on pull request #7144: [HUDI-5164] improve delete files for drop table and truncate table

2022-11-06 Thread GitBox
hudi-bot commented on PR #7144: URL: https://github.com/apache/hudi/pull/7144#issuecomment-1305157778 ## CI report: * cc79684672fce0971fd4313c9e6274d636064afd Azure:

[GitHub] [hudi] nsivabalan closed pull request #7109: [MINOR] Adding tags to assist in filtering tests from maven command line

2022-11-06 Thread GitBox
nsivabalan closed pull request #7109: [MINOR] Adding tags to assist in filtering tests from maven command line URL: https://github.com/apache/hudi/pull/7109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] hudi-bot commented on pull request #7144: [HUDI-5164] improve delete files for drop table and truncate table

2022-11-06 Thread GitBox
hudi-bot commented on PR #7144: URL: https://github.com/apache/hudi/pull/7144#issuecomment-1305154544 ## CI report: * cc79684672fce0971fd4313c9e6274d636064afd Azure:

[GitHub] [hudi] nsivabalan commented on issue #6811: [SUPPORT] Slow upsert performance

2022-11-06 Thread GitBox
nsivabalan commented on issue #6811: URL: https://github.com/apache/hudi/issues/6811#issuecomment-1305153782 Few pointers: - for failed executors, do check if they are failing due to OOMs. if yes, you may need to tune your spark memory configs. - I see you have set

[GitHub] [hudi] hudi-bot commented on pull request #7144: [HUDI-5164] improve delete files for drop table and truncate table

2022-11-06 Thread GitBox
hudi-bot commented on PR #7144: URL: https://github.com/apache/hudi/pull/7144#issuecomment-1305151207 ## CI report: * cc79684672fce0971fd4313c9e6274d636064afd Azure:

[GitHub] [hudi] nsivabalan commented on issue #6226: [SUPPORT] OCC locks with data on S3 and DynamoDB fails to acquire

2022-11-06 Thread GitBox
nsivabalan commented on issue #6226: URL: https://github.com/apache/hudi/issues/6226#issuecomment-1305150161 @atharvai : do you have any updates for us. or if you got the issue resolved, let us know how did you go about resolving it. so that it could help others in the community -- This

[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] Lazy fetching partition path & file slice for HoodieFileIndex

2022-11-06 Thread GitBox
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1305147415 ## CI report: * c3aba0dc3e2f7c2c6240d3aa5bc279cf8f359153 Azure:

[GitHub] [hudi] nsivabalan commented on issue #7064: [SUPPORT] Data ingestion from csv file i.e. CsvDFSSource is working for FilebasedSchemaProvider but not working if schema is provided with SchemaRe

2022-11-06 Thread GitBox
nsivabalan commented on issue #7064: URL: https://github.com/apache/hudi/issues/7064#issuecomment-1305146563 @ROOBALJINDAL : gentle ping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hudi-bot commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-06 Thread GitBox
hudi-bot commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1305146577 ## CI report: * 4beb69b67b9b8d39beb46e94782629f39d4faca2 Azure:

[GitHub] [hudi] nsivabalan commented on issue #7137: [SUPPORT] The stream read supports the second level delay.

2022-11-06 Thread GitBox
nsivabalan commented on issue #7137: URL: https://github.com/apache/hudi/issues/7137#issuecomment-1305146238 sorry, I am not sure whats your ask here. can you throw some more light please. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] nsivabalan commented on issue #7141: [SUPPORT] Question on Bootstrapped hudi table

2022-11-06 Thread GitBox
nsivabalan commented on issue #7141: URL: https://github.com/apache/hudi/issues/7141#issuecomment-1305145196 looks like the behavior you see is expected. @yihua : can you take this up please. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] pratyakshsharma commented on a diff in pull request #4910: [HUDI-2560][RFC-33] Support full Schema evolution for Spark

2022-11-06 Thread GitBox
pratyakshsharma commented on code in PR #4910: URL: https://github.com/apache/hudi/pull/4910#discussion_r1015042878 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java: ## @@ -78,12 +90,41 @@ public void

[GitHub] [hudi] nsivabalan commented on issue #7081: [SUPPORT] optimistic_concurrency_control

2022-11-06 Thread GitBox
nsivabalan commented on issue #7081: URL: https://github.com/apache/hudi/issues/7081#issuecomment-1305144034 @mandnhdaiyudfaio : do you have any updates for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] nsivabalan closed issue #7136: [SUPPORT] Support automatic sorting of data by primary key

2022-11-06 Thread GitBox
nsivabalan closed issue #7136: [SUPPORT] Support automatic sorting of data by primary key URL: https://github.com/apache/hudi/issues/7136 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] nsivabalan commented on issue #7136: [SUPPORT] Support automatic sorting of data by primary key

2022-11-06 Thread GitBox
nsivabalan commented on issue #7136: URL: https://github.com/apache/hudi/issues/7136#issuecomment-1305143443 Incidentally I just happened to start a WIP patch towards this end https://github.com/apache/hudi/pull/7146 :) Feel free to add comments to jira or the patch. -- This

[GitHub] [hudi] nsivabalan commented on issue #5717: [SUPPORT] Hudi 0.10.1 Reconcile schema not working

2022-11-06 Thread GitBox
nsivabalan commented on issue #5717: URL: https://github.com/apache/hudi/issues/5717#issuecomment-1305142246 I see. If you would like to understand any details on hudi, or need help w/ any Poc or onboarding let us know. CC @bhasudha -- This is an automated message from the Apache Git

[GitHub] [hudi] YuweiXiao commented on a diff in pull request #5581: [HUDI-53] Implementation of a native DFS based index based on the metadata table.

2022-11-06 Thread GitBox
YuweiXiao commented on code in PR #5581: URL: https://github.com/apache/hudi/pull/5581#discussion_r878654024 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/SparkMetadataTableRecordIndex.java: ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] nsivabalan commented on issue #6925: [SUPPORT]Table in Dynamo DB is not getting created during concurrent writes to table

2022-11-06 Thread GitBox
nsivabalan commented on issue #6925: URL: https://github.com/apache/hudi/issues/6925#issuecomment-1305139824  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[jira] [Updated] (HUDI-4462) Flink Sink cannot report metrics

2022-11-06 Thread Zhaojing Yu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaojing Yu updated HUDI-4462: -- Epic Link: HUDI-5168 > Flink Sink cannot report metrics > - > >

[jira] [Updated] (HUDI-4719) metrics for flink

2022-11-06 Thread Zhaojing Yu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaojing Yu updated HUDI-4719: -- Epic Link: HUDI-5168 > metrics for flink > - > > Key: HUDI-4719 >

[jira] [Created] (HUDI-5168) Flink metrics integration

2022-11-06 Thread Zhaojing Yu (Jira)
Zhaojing Yu created HUDI-5168: - Summary: Flink metrics integration Key: HUDI-5168 URL: https://issues.apache.org/jira/browse/HUDI-5168 Project: Apache Hudi Issue Type: Epic Components:

[jira] [Updated] (HUDI-5168) Flink metrics integration

2022-11-06 Thread Zhaojing Yu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhaojing Yu updated HUDI-5168: -- Component/s: flink-sql > Flink metrics integration > - > > Key:

[GitHub] [hudi] chenshzh commented on a diff in pull request #7017: [HUDI-5066] Support flink hoodie source metaclient cache

2022-11-06 Thread GitBox
chenshzh commented on code in PR #7017: URL: https://github.com/apache/hudi/pull/7017#discussion_r1015033559 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableSource.java: ## @@ -162,7 +163,7 @@ public HoodieTableSource( this.limit = limit

[GitHub] [hudi] hudi-bot commented on pull request #6419: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2022-11-06 Thread GitBox
hudi-bot commented on PR #6419: URL: https://github.com/apache/hudi/pull/6419#issuecomment-1305098792 ## CI report: * f1abf8a964ef85c744f73de667ff6f3e594c8fd8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6419: [HUDI-2057] CTAS Generate An External Table When Create Managed Table

2022-11-06 Thread GitBox
hudi-bot commented on PR #6419: URL: https://github.com/apache/hudi/pull/6419#issuecomment-1305094536 ## CI report: * f1abf8a964ef85c744f73de667ff6f3e594c8fd8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7128: [HUDI-5153] Fix the write token name resolution of cdc log file

2022-11-06 Thread GitBox
hudi-bot commented on PR #7128: URL: https://github.com/apache/hudi/pull/7128#issuecomment-1305051150 ## CI report: * 97673d45bac8d7198f89ee7084ba09f167c32259 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7017: [HUDI-5066] Support flink hoodie source metaclient cache

2022-11-06 Thread GitBox
hudi-bot commented on PR #7017: URL: https://github.com/apache/hudi/pull/7017#issuecomment-1305050913 ## CI report: * 78c0d932bdc06ab1b28d408d3b04d0a37bc160a7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] Lazy fetching partition path & file slice for HoodieFileIndex

2022-11-06 Thread GitBox
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1305050588 ## CI report: * 13b8d331e0dcf5ac1f958b91770e2024bce04c43 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7128: [HUDI-5153] Fix the write token name resolution of cdc log file

2022-11-06 Thread GitBox
hudi-bot commented on PR #7128: URL: https://github.com/apache/hudi/pull/7128#issuecomment-1305047370 ## CI report: * 88af560e2e8bdc9c2bbb7b6830c2ae69a792d944 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7017: [HUDI-5066] Support flink hoodie source metaclient cache

2022-11-06 Thread GitBox
hudi-bot commented on PR #7017: URL: https://github.com/apache/hudi/pull/7017#issuecomment-1305047229 ## CI report: * 78c0d932bdc06ab1b28d408d3b04d0a37bc160a7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] Lazy fetching partition path & file slice for HoodieFileIndex

2022-11-06 Thread GitBox
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1305047029 ## CI report: * fe18f26f8ab63a28c98b23c6940b6135b6155d91 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-11-06 Thread GitBox
hudi-bot commented on PR #7139: URL: https://github.com/apache/hudi/pull/7139#issuecomment-1305045039 ## CI report: * 20c87dffbd4645d3126916c4b556ba07d905c913 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7128: [HUDI-5153] Fix the write token name resolution of cdc log file

2022-11-06 Thread GitBox
hudi-bot commented on PR #7128: URL: https://github.com/apache/hudi/pull/7128#issuecomment-1305045011 ## CI report: * 88af560e2e8bdc9c2bbb7b6830c2ae69a792d944 Azure:

[GitHub] [hudi] xiarixiaoyao commented on pull request #4910: [HUDI-2560][RFC-33] Support full Schema evolution for Spark

2022-11-06 Thread GitBox
xiarixiaoyao commented on PR #4910: URL: https://github.com/apache/hudi/pull/4910#issuecomment-1305043103 > @xiarixiaoyao As ref-33 said, `Partition evolution is not included in this design, Partition evolution will come soon after schema evolution.`, severaI months passed, I want to know

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #4910: [HUDI-2560][RFC-33] Support full Schema evolution for Spark

2022-11-06 Thread GitBox
xiarixiaoyao commented on code in PR #4910: URL: https://github.com/apache/hudi/pull/4910#discussion_r1014979569 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java: ## @@ -78,12 +90,41 @@ public void runMerge(HoodieTable>,

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #4910: [HUDI-2560][RFC-33] Support full Schema evolution for Spark

2022-11-06 Thread GitBox
xiarixiaoyao commented on code in PR #4910: URL: https://github.com/apache/hudi/pull/4910#discussion_r1014978749 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java: ## @@ -78,12 +90,41 @@ public void runMerge(HoodieTable>,

[jira] [Commented] (HUDI-5018) Make user-provided copyOnWriteRecordSizeEstimate first precedence

2022-11-06 Thread xi chaomin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629576#comment-17629576 ] xi chaomin commented on HUDI-5018: -- Hi [~xushiyan] , I can do this improvement, but I have a question,

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #4910: [HUDI-2560][RFC-33] Support full Schema evolution for Spark

2022-11-06 Thread GitBox
xiarixiaoyao commented on code in PR #4910: URL: https://github.com/apache/hudi/pull/4910#discussion_r1014970711 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java: ## @@ -78,12 +90,41 @@ public void runMerge(HoodieTable>,

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #4910: [HUDI-2560][RFC-33] Support full Schema evolution for Spark

2022-11-06 Thread GitBox
xiarixiaoyao commented on code in PR #4910: URL: https://github.com/apache/hudi/pull/4910#discussion_r1014970711 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/HoodieMergeHelper.java: ## @@ -78,12 +90,41 @@ public void runMerge(HoodieTable>,

[GitHub] [hudi] trushev commented on a diff in pull request #7151: [MINOR] Performance improvement of flink ITs with reused miniCluster

2022-11-06 Thread GitBox
trushev commented on code in PR #7151: URL: https://github.com/apache/hudi/pull/7151#discussion_r1014970150 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/AbstractHoodieTestBase.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] fengjian428 commented on issue #7145: [SUPPORT] `org.apache.avro.SchemaParseException: Can't redefine: array` when an Array containing a Struct is the only field in a Struct

2022-11-06 Thread GitBox
fengjian428 commented on issue #7145: URL: https://github.com/apache/hudi/issues/7145#issuecomment-1305023159 @lewyh on the deltastreamer side can add a dummy field by changing the schema provider config maybe we can add a schema-validate function to resolve this automatically. --

[GitHub] [hudi] hudi-bot commented on pull request #7128: [HUDI-5153] Fix the write token name resolution of cdc log file

2022-11-06 Thread GitBox
hudi-bot commented on PR #7128: URL: https://github.com/apache/hudi/pull/7128#issuecomment-1305012249 ## CI report: * 88af560e2e8bdc9c2bbb7b6830c2ae69a792d944 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-06 Thread GitBox
hudi-bot commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1305011480 ## CI report: * 21de15aaa4278c711981afaf5aed16d1293e41f7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] Lazy fetching partition path & file slice for HoodieFileIndex

2022-11-06 Thread GitBox
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1305009832 ## CI report: * fe18f26f8ab63a28c98b23c6940b6135b6155d91 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-06 Thread GitBox
hudi-bot commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1305009340 ## CI report: * 21de15aaa4278c711981afaf5aed16d1293e41f7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6680: [HUDI-4812] Lazy fetching partition path & file slice for HoodieFileIndex

2022-11-06 Thread GitBox
hudi-bot commented on PR #6680: URL: https://github.com/apache/hudi/pull/6680#issuecomment-1305007619 ## CI report: * fe18f26f8ab63a28c98b23c6940b6135b6155d91 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7151: [MINOR] Performance improvement of flink ITs with reused miniCluster

2022-11-06 Thread GitBox
hudi-bot commented on PR #7151: URL: https://github.com/apache/hudi/pull/7151#issuecomment-1305005824 ## CI report: * fb4b5616278cdd662ccee6add7bbb7b0684554ac Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-11-06 Thread GitBox
hudi-bot commented on PR #7139: URL: https://github.com/apache/hudi/pull/7139#issuecomment-1305005786 ## CI report: * d035b4ddb54f75802565366d9e0ea57e557549c4 Azure:

[GitHub] [hudi] xiarixiaoyao commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-06 Thread GitBox
xiarixiaoyao commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1305005289 @alexeykudinkin @nsivabalan rebase the codes, Add parameters to control this behavior, which is enabled by default -- This is an automated message from the Apache Git Service.

[GitHub] [hudi] hudi-bot commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-06 Thread GitBox
hudi-bot commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1305004798 ## CI report: * 21de15aaa4278c711981afaf5aed16d1293e41f7 Azure:

[GitHub] [hudi] xiarixiaoyao commented on pull request #5165: [HUDI-3742] Enable parquet enableVectorizedReader for spark inc query to improve peformance

2022-11-06 Thread GitBox
xiarixiaoyao commented on PR #5165: URL: https://github.com/apache/hudi/pull/5165#issuecomment-1305004653 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] danny0405 commented on a diff in pull request #7151: [MINOR] Performance improvement of flink ITs with reused miniCluster

2022-11-06 Thread GitBox
danny0405 commented on code in PR #7151: URL: https://github.com/apache/hudi/pull/7151#discussion_r1014954719 ## hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/utils/AbstractHoodieTestBase.java: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] YannByron commented on a diff in pull request #7144: [HUDI-5164] improve delete files for drop table and truncate table

2022-11-06 Thread GitBox
YannByron commented on code in PR #7144: URL: https://github.com/apache/hudi/pull/7144#discussion_r1014952640 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/command/DropHoodieTableCommand.scala: ## @@ -69,35 +72,41 @@ case class

[jira] [Commented] (HUDI-5088) Failed to synchronize the hive metadata of the Flink table

2022-11-06 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629565#comment-17629565 ] Danny Chen commented on HUDI-5088: -- Fixed via master branch: 7a1a6837e0c7be2cb401fbe6be872064feae >

[jira] [Resolved] (HUDI-5088) Failed to synchronize the hive metadata of the Flink table

2022-11-06 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-5088. -- > Failed to synchronize the hive metadata of the Flink table >

[hudi] branch master updated: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table (#7056)

2022-11-06 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7a1a6837e0 [HUDI-5088]Fix bug:Failed to

[GitHub] [hudi] danny0405 merged pull request #7056: [HUDI-5088]Fix bug:Failed to synchronize the hive metadata of the Flink table

2022-11-06 Thread GitBox
danny0405 merged PR #7056: URL: https://github.com/apache/hudi/pull/7056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot commented on pull request #7139: [HUDI-5160] Spark df saveAsTable failed with CTAS

2022-11-06 Thread GitBox
hudi-bot commented on PR #7139: URL: https://github.com/apache/hudi/pull/7139#issuecomment-1304973813 ## CI report: * d035b4ddb54f75802565366d9e0ea57e557549c4 Azure:

[GitHub] [hudi] waywtdcc commented on issue #7136: [SUPPORT] Support automatic sorting of data by primary key

2022-11-06 Thread GitBox
waywtdcc commented on issue #7136: URL: https://github.com/apache/hudi/issues/7136#issuecomment-1304949829 My idea is that when bucket index writing is enabled, each bucket is sorted according to the primary key, so the merging efficiency is high; In addition, reading can also use

[GitHub] [hudi] waywtdcc commented on issue #7136: [SUPPORT] Support automatic sorting of data by primary key

2022-11-06 Thread GitBox
waywtdcc commented on issue #7136: URL: https://github.com/apache/hudi/issues/7136#issuecomment-1304949306 > Do you want a sort within partition or just sort in file? If partition sort, I think cluster should be a good solution. For primary key sorted in file, I developed this feature and

[GitHub] [hudi] hudi-bot commented on pull request #7151: [MINOR] Performance improvement of flink ITs with reused miniCluster

2022-11-06 Thread GitBox
hudi-bot commented on PR #7151: URL: https://github.com/apache/hudi/pull/7151#issuecomment-1304934889 ## CI report: * d411caeef03acb1671c718cbf5370fbbb982ea47 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7151: [MINOR] Performance improvement of flink ITs with reused miniCluster

2022-11-06 Thread GitBox
hudi-bot commented on PR #7151: URL: https://github.com/apache/hudi/pull/7151#issuecomment-1304932917 ## CI report: * d411caeef03acb1671c718cbf5370fbbb982ea47 Azure:

[GitHub] [hudi] hussein-awala commented on a diff in pull request #5113: [HUDI-3625] [RFC-60] Optimized storage layout for Cloud Object Stores

2022-11-06 Thread GitBox
hussein-awala commented on code in PR #5113: URL: https://github.com/apache/hudi/pull/5113#discussion_r1014907839 ## rfc/rfc-60/rfc-60.md: ## @@ -0,0 +1,225 @@ + + +# RFC-60: Federated Storage Layer + +## Proposers +- @umehrot2 + +## Approvers +- @vinoth +- @shivnarayan + +##

[GitHub] [hudi] eshu opened a new issue, #7154: [SUPPORT] Hudi 0.12.2 release (Unknown versionCode:5)

2022-11-06 Thread GitBox
eshu opened a new issue, #7154: URL: https://github.com/apache/hudi/issues/7154 I tried to migrate to Hudi 0.12.1 on Glue and some jobs was broken because the issue https://issues.apache.org/jira/browse/HUDI-4971 But some of them was done successfully. It means the dataset format

[GitHub] [hudi] HEPBO3AH commented on issue #7062: [SUPPORT] Appeding to files during UPSERT causes executors to die due to memory issues.

2022-11-06 Thread GitBox
HEPBO3AH commented on issue #7062: URL: https://github.com/apache/hudi/issues/7062#issuecomment-1304913263 Hello! Thank you for the reply. > hey @HEPBO3AH : do you mean to say that, even after our fix https://github.com/apache/hudi/pull/6864, your avg record size estimate is wrong

[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304912867 ## CI report: * 36e1e84c05112409ae4d1d4e8b2f13eadc69237b Azure:

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6284: [HUDI-4526] Improve spillableMapBasePath disk directory is full

2022-11-06 Thread GitBox
nsivabalan commented on code in PR #6284: URL: https://github.com/apache/hudi/pull/6284#discussion_r1014904091 ## hudi-common/src/main/java/org/apache/hudi/common/util/FileIOUtils.java: ## @@ -204,4 +204,40 @@ public static Option readDataFromPath(FileSystem fileSystem,

[GitHub] [hudi] hussein-awala commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hussein-awala commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304909796 Hey @nsivabalan, can you review the PR please? Feel free to change the name of the new configuration if you have a better suggestion. -- This is an automated message from the

[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304882995 ## CI report: * 1177f60208a2d97c0fce5b8c9aca309a0494cd18 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304881559 ## CI report: * d18514b58b5fa732f55bd3ad6212484b7766f50e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304867810 ## CI report: * d18514b58b5fa732f55bd3ad6212484b7766f50e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304866940 ## CI report: * a78cf3612346492a082bf211940c46c9e82752a1 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7153: [HUDI-5167] Reducing total test run time: reducing tests for virtual keys

2022-11-06 Thread GitBox
hudi-bot commented on PR #7153: URL: https://github.com/apache/hudi/pull/7153#issuecomment-1304865905 ## CI report: * 74fee0cc4ca61be2c0ce8328d91fbfddcb7ba725 Azure:

[GitHub] [hudi] gtwuser commented on issue #6925: [SUPPORT]Table in Dynamo DB is not getting created during concurrent writes to table

2022-11-06 Thread GitBox
gtwuser commented on issue #6925: URL: https://github.com/apache/hudi/issues/6925#issuecomment-1304855171 After scanning through the existing git issues was able to fix it. Ref: [#5451](https://github.com/apache/hudi/issues/5451) Thanks to all. -- This is an automated message from

[GitHub] [hudi] gtwuser closed issue #6925: [SUPPORT]Table in Dynamo DB is not getting created during concurrent writes to table

2022-11-06 Thread GitBox
gtwuser closed issue #6925: [SUPPORT]Table in Dynamo DB is not getting created during concurrent writes to table URL: https://github.com/apache/hudi/issues/6925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [hudi] hudi-bot commented on pull request #7041: [HUDI-5053] Create clean complete commit when there is none to clean in order to leverage incremental cleaning

2022-11-06 Thread GitBox
hudi-bot commented on PR #7041: URL: https://github.com/apache/hudi/pull/7041#issuecomment-1304851516 ## CI report: * a78cf3612346492a082bf211940c46c9e82752a1 Azure:

  1   2   >