[GitHub] [hudi] zhuanshenbsj1 opened a new pull request, #9585: [HUDI-6809] Optimizing the judgment of generating clustering plans

2023-08-30 Thread via GitHub
zhuanshenbsj1 opened a new pull request, #9585: URL: https://github.com/apache/hudi/pull/9585 ### Change Logs We currently uses Flink to generate clustering plans online, and then Spark to execute them offline. What we expect is to generate a clustering plan every four

[jira] [Created] (HUDI-6809) Optimizing the judgment of generating clustering plans

2023-08-30 Thread zhuanshenbsj1 (Jira)
zhuanshenbsj1 created HUDI-6809: --- Summary: Optimizing the judgment of generating clustering plans Key: HUDI-6809 URL: https://issues.apache.org/jira/browse/HUDI-6809 Project: Apache Hudi Issue

[GitHub] [hudi] zhuanshenbsj1 opened a new pull request, #9584: [HUDI-6808] SkipCompaction Config should not affect the stream read of the cow table

2023-08-30 Thread via GitHub
zhuanshenbsj1 opened a new pull request, #9584: URL: https://github.com/apache/hudi/pull/9584 ### Change Logs The same action type(commit) is used after the completion of mor-compaction and cow-commit. If skipcompaction is configured, it will cause the stream read to skip the normal

[GitHub] [hudi] viverlxl closed issue #9575: [SUPPORT] sparkSql create table sync metastore Failed

2023-08-30 Thread via GitHub
viverlxl closed issue #9575: [SUPPORT] sparkSql create table sync metastore Failed URL: https://github.com/apache/hudi/issues/9575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[jira] [Created] (HUDI-6808) SkipCompaction Config should not affect the stream read of the cow table

2023-08-30 Thread zhuanshenbsj1 (Jira)
zhuanshenbsj1 created HUDI-6808: --- Summary: SkipCompaction Config should not affect the stream read of the cow table Key: HUDI-6808 URL: https://issues.apache.org/jira/browse/HUDI-6808 Project: Apache

[GitHub] [hudi] codope commented on a diff in pull request #9565: [HUDI-6725] Support efficient completion time queries on the timeline

2023-08-30 Thread via GitHub
codope commented on code in PR #9565: URL: https://github.com/apache/hudi/pull/9565#discussion_r1311099826 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/CompletionTimeQueryView.java: ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] hudi-bot commented on pull request #9572: [WIP][HUDI-6702]Utilize merger to replace insertValue api

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9572: URL: https://github.com/apache/hudi/pull/9572#issuecomment-1700369972 ## CI report: * ad05887b523496f59ac8b6e976183d6c325ed94d UNKNOWN * cf848446b9c837be3c1c2fdc7930b26f920a0754 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9571: Enabling comprehensive schema evolution in delta streamer code

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9571: URL: https://github.com/apache/hudi/pull/9571#issuecomment-1700369921 ## CI report: * 3af6011d72b294b0995d52be40a6d91e6eff9a1b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9583: [Test]update operator name for compact test class

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9583: URL: https://github.com/apache/hudi/pull/9583#issuecomment-1700342790 ## CI report: * 4f265efa3f9a216c511abf94c065700e74b21679 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9583: [Test]update operator name for compact test class

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9583: URL: https://github.com/apache/hudi/pull/9583#issuecomment-1700337856 ## CI report: * 4f265efa3f9a216c511abf94c065700e74b21679 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #9577: [HUDI-6805] Print detailed error message in clustering

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9577: URL: https://github.com/apache/hudi/pull/9577#issuecomment-1700337802 ## CI report: * 9d1b03d93f9f5bfab485a89e4b3de9aa9cca4f17 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9577: [HUDI-6805] Print detailed error message in clustering

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9577: URL: https://github.com/apache/hudi/pull/9577#issuecomment-1700332724 ## CI report: * 9d1b03d93f9f5bfab485a89e4b3de9aa9cca4f17 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1700332631 ## CI report: * f6a01d87b32aebfce310375b8925f4802acca686 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1700332485 ## CI report: * 39166302aadd51524e017f92a883e960e07a37a4 Azure:

[GitHub] [hudi] voonhous commented on issue #9536: Duplicate Row in Same Partition using Global Bloom Index

2023-08-30 Thread via GitHub
voonhous commented on issue #9536: URL: https://github.com/apache/hudi/issues/9536#issuecomment-1700326761 FWIU, this is a sporadic thing that OP is not able to reproduce anymore. Might be related to this issue: https://github.com/apache/hudi/pull/9035 One way to determine if

[GitHub] [hudi] hehuiyuan opened a new pull request, #9583: update operator name for compact test class

2023-08-30 Thread via GitHub
hehuiyuan opened a new pull request, #9583: URL: https://github.com/apache/hudi/pull/9583 ### Change Logs update some error operator name and uid -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[jira] [Closed] (HUDI-3727) Add metrics for async indexer

2023-08-30 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-3727. - Resolution: Done > Add metrics for async indexer > - > > Key:

[GitHub] [hudi] aajisaka commented on issue #8160: [SUPPORT] Schema evolution wrt to datatype promotion isnt working. org.apache.avro.AvroRuntimeException: cannot support rewrite value for schema type

2023-08-30 Thread via GitHub
aajisaka commented on issue #8160: URL: https://github.com/apache/hudi/issues/8160#issuecomment-1700309154 @ad1happy2go I think we can just close this issue given there's no response from the requester. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] hudi-bot commented on pull request #9582: [MINOR]Fix hbase index config improper use

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9582: URL: https://github.com/apache/hudi/pull/9582#issuecomment-1700308019 ## CI report: * cc93db8d3ad775d3fd244d07ad17786596377e55 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9482: URL: https://github.com/apache/hudi/pull/9482#issuecomment-1700307775 ## CI report: * 39166302aadd51524e017f92a883e960e07a37a4 Azure:

[GitHub] [hudi] aajisaka commented on a diff in pull request #9577: [HUDI-6805] Print detailed error message in clustering

2023-08-30 Thread via GitHub
aajisaka commented on code in PR #9577: URL: https://github.com/apache/hudi/pull/9577#discussion_r1311043084 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/io/storage/row/HoodieRowCreateHandle.java: ## @@ -241,6 +242,9 @@ public WriteStatus close() throws

[GitHub] [hudi] hudi-bot commented on pull request #9582: [MINOR]Fix hbase index config improper use

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9582: URL: https://github.com/apache/hudi/pull/9582#issuecomment-1700303079 ## CI report: * cc93db8d3ad775d3fd244d07ad17786596377e55 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] yihua commented on issue #9430: [SUPPORT] Problem when refactor a custom payload to new API defined in RFC-46

2023-08-30 Thread via GitHub
yihua commented on issue #9430: URL: https://github.com/apache/hudi/issues/9430#issuecomment-1700302192 Great! @beyond1920 what's your Slack or Wechat handle? Since @linliu-code is fixing this and making it production ready, three of us should sync and see how we can divide the work here.

[GitHub] [hudi] hudi-bot commented on pull request #9581: [HUDI-6795] Implement writing record_positions to log blocks for updates and deletes

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9581: URL: https://github.com/apache/hudi/pull/9581#issuecomment-1700297752 ## CI report: * 1208189ffb60441f9544933a2446ad194509c391 Azure:

[GitHub] [hudi] flashJd opened a new pull request, #9582: [MINOR]Fix hbase index config improper use

2023-08-30 Thread via GitHub
flashJd opened a new pull request, #9582: URL: https://github.com/apache/hudi/pull/9582 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact N/A ### Risk level (write none, low medium or high below) N/A

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9482: [HUDI-6728] Update BigQuery manifest sync to support schema evolution

2023-08-30 Thread via GitHub
nsivabalan commented on code in PR #9482: URL: https://github.com/apache/hudi/pull/9482#discussion_r1311022781 ## hudi-gcp/src/main/java/org/apache/hudi/gcp/bigquery/BigQuerySyncTool.java: ## @@ -52,34 +57,55 @@ public class BigQuerySyncTool extends HoodieSyncTool {

[GitHub] [hudi] FishMAN002 commented on issue #9506: [SUPPORT] ctas error in spark3.1.1 & hudi 0.13.1

2023-08-30 Thread via GitHub
FishMAN002 commented on issue #9506: URL: https://github.com/apache/hudi/issues/9506#issuecomment-1700277934 > @ad1happy2go Are you suggesting that I try this command: ``` /usr/local/opt/apache-maven-3.8.5/bin/mvn clean package -DskipTests -Dspark3.1 -Dflink1.14 -Dscala-2.12

[jira] [Updated] (HUDI-6780) Replace classnames by modes/enums in table properties

2023-08-30 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6780: -- Status: In Progress (was: Open) > Replace classnames by modes/enums in table properties >

[jira] [Updated] (HUDI-6779) Audit current hoodie.properties

2023-08-30 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6779: -- Status: In Progress (was: Open) > Audit current hoodie.properties > --- >

[jira] [Updated] (HUDI-6776) Unify commit metadata content in json for completed and avro for pending commits

2023-08-30 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-6776: -- Status: In Progress (was: Open) > Unify commit metadata content in json for completed and avro for

[GitHub] [hudi] hudi-bot commented on pull request #9581: [HUDI-6795] Implement writing record_positions to log blocks for updates and deletes

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9581: URL: https://github.com/apache/hudi/pull/9581#issuecomment-1700264208 ## CI report: * 1208189ffb60441f9544933a2446ad194509c391 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] stream2000 commented on a diff in pull request #9515: [HUDI-2141] Support flink compaction metrics

2023-08-30 Thread via GitHub
stream2000 commented on code in PR #9515: URL: https://github.com/apache/hudi/pull/9515#discussion_r1311016199 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkWriteMetrics.java: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1700249267 ## CI report: * d8d12bf0d3d2c24b0f03be4faf4c293c70db9ecd Azure:

[jira] [Updated] (HUDI-6795) Implement generation of record_positions for updates and deletes on write path

2023-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6795: - Labels: pull-request-available (was: ) > Implement generation of record_positions for updates

[GitHub] [hudi] yihua opened a new pull request, #9581: [HUDI-6795] Implement writing record_positions to log blocks for updates and deletes

2023-08-30 Thread via GitHub
yihua opened a new pull request, #9581: URL: https://github.com/apache/hudi/pull/9581 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[GitHub] [hudi] empcl commented on a diff in pull request #9580: automatically create a database when using the flink catalog dfs mode

2023-08-30 Thread via GitHub
empcl commented on code in PR #9580: URL: https://github.com/apache/hudi/pull/9580#discussion_r1311005106 ## pom.xml: ## @@ -1714,6 +1714,11 @@ + + nexus-aliyun + Nexus aliyun Review Comment: Sorry, it was too late at the time and I didn't pay

[GitHub] [hudi] danny0405 commented on a diff in pull request #9580: automatically create a database when using the flink catalog dfs mode

2023-08-30 Thread via GitHub
danny0405 commented on code in PR #9580: URL: https://github.com/apache/hudi/pull/9580#discussion_r1311004467 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java: ## @@ -125,6 +125,15 @@ public void open() throws CatalogException {

[GitHub] [hudi] danny0405 commented on a diff in pull request #9580: automatically create a database when using the flink catalog dfs mode

2023-08-30 Thread via GitHub
danny0405 commented on code in PR #9580: URL: https://github.com/apache/hudi/pull/9580#discussion_r1311004246 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalog.java: ## @@ -125,6 +125,15 @@ public void open() throws CatalogException {

[GitHub] [hudi] danny0405 commented on a diff in pull request #9580: automatically create a database when using the flink catalog dfs mode

2023-08-30 Thread via GitHub
danny0405 commented on code in PR #9580: URL: https://github.com/apache/hudi/pull/9580#discussion_r1311003903 ## pom.xml: ## @@ -1714,6 +1714,11 @@ + + nexus-aliyun + Nexus aliyun Review Comment: do we need this? -- This is an automated

[jira] [Closed] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-30 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6763. Resolution: Fixed Fixed via master branch: 2f7e9caebb0e7f68a7cc1a9c541cc67440eafa44 > WriteStats are

[jira] [Updated] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-30 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6763: - Fix Version/s: 0.14.0 > WriteStats are extracted twice in BaseSparkCommitActionExecutor >

[hudi] branch master updated: [HUDI-6763] Optimize collect calls (#9561)

2023-08-30 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 2f7e9caebb0 [HUDI-6763] Optimize collect calls

[GitHub] [hudi] CTTY commented on pull request #8929: [HUDI-6350] Allow athena to use the metadata table

2023-08-30 Thread via GitHub
CTTY commented on PR #8929: URL: https://github.com/apache/hudi/pull/8929#issuecomment-1700184768 I've manually tested this on EMR 6.12 by setting configs below when writing data: ``` .option(DataSourceWriteOptions.META_SYNC_CLIENT_TOOL_CLASS_NAME.key,

[GitHub] [hudi] hudi-bot commented on pull request #9572: [WIP][HUDI-6702]Utilize merger to replace insertValue api

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9572: URL: https://github.com/apache/hudi/pull/9572#issuecomment-1700031447 ## CI report: * ad05887b523496f59ac8b6e976183d6c325ed94d UNKNOWN * 7e769e60b101466c27604ce531b95f42eab87885 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9546: [HUDI-6397] [HUDI-6759] Fixing misc bugs w/ metadata table

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9546: URL: https://github.com/apache/hudi/pull/9546#issuecomment-1700031017 ## CI report: * f391322cd2d754ce85fbd33ca516c19d688ab784 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9546: [HUDI-6397] [HUDI-6759] Fixing misc bugs w/ metadata table

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9546: URL: https://github.com/apache/hudi/pull/9546#issuecomment-1700013861 ## CI report: * f391322cd2d754ce85fbd33ca516c19d688ab784 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9572: [WIP][HUDI-6702]Utilize merger to replace insertValue api

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9572: URL: https://github.com/apache/hudi/pull/9572#issuecomment-1700014321 ## CI report: * ad05887b523496f59ac8b6e976183d6c325ed94d UNKNOWN * 7e769e60b101466c27604ce531b95f42eab87885 Azure:

[GitHub] [hudi] yihua commented on a diff in pull request #9572: [WIP][HUDI-6702]Utilize merger to replace insertValue api

2023-08-30 Thread via GitHub
yihua commented on code in PR #9572: URL: https://github.com/apache/hudi/pull/9572#discussion_r1310958195 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/table/log/HoodieFileSliceReader.java: ## @@ -38,12 +36,11 @@ public class HoodieFileSliceReader

[GitHub] [hudi] yihua commented on a diff in pull request #9572: [WIP][HUDI-6702]Utilize merger to replace insertValue api

2023-08-30 Thread via GitHub
yihua commented on code in PR #9572: URL: https://github.com/apache/hudi/pull/9572#discussion_r1310956732 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java: ## @@ -184,16 +185,20 @@ public HoodieRecord copy() { @Override public HoodieRecord

[GitHub] [hudi] CTTY commented on pull request #9221: [HUDI-6550] Add Hadoop conf to HiveConf for HiveSyncConfig

2023-08-30 Thread via GitHub
CTTY commented on PR #9221: URL: https://github.com/apache/hudi/pull/9221#issuecomment-1699982042 Hey @xushiyan, I've tested this fix manually with EMR Serverless and it works fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (HUDI-6702) Extend merge API to support all merging operations

2023-08-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6702: - Labels: pull-request-available (was: ) > Extend merge API to support all merging operations >

[GitHub] [hudi] linliu-code commented on a diff in pull request #9572: [WIP][HUDI-6702]Utilize merger to replace insertValue api

2023-08-30 Thread via GitHub
linliu-code commented on code in PR #9572: URL: https://github.com/apache/hudi/pull/9572#discussion_r1310932741 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecord.java: ## @@ -184,16 +185,20 @@ public HoodieRecord copy() { @Override public

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1699927341 ## CI report: * c54656897cae544738e30fe42a0fb684787ad704 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9571: Enabling comprehensive schema evolution in delta streamer code

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9571: URL: https://github.com/apache/hudi/pull/9571#issuecomment-1699927529 ## CI report: * 070278982fdd12e8f708ea22cbfc641b41d2cfc7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1699926967 ## CI report: * 1c87979b57e306970bcc95530f45586badcf0a6a Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9571: Enabling comprehensive schema evolution in delta streamer code

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9571: URL: https://github.com/apache/hudi/pull/9571#issuecomment-1699916156 ## CI report: * 070278982fdd12e8f708ea22cbfc641b41d2cfc7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1699915931 ## CI report: * c54656897cae544738e30fe42a0fb684787ad704 Azure:

[jira] [Commented] (HUDI-6771) Support Bloom Filter in Keyed Lookup Reader

2023-08-30 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760639#comment-17760639 ] Lin Liu commented on HUDI-6771: --- Have added the bloom filter support in the lookup reader, and added a unit

[GitHub] [hudi] hudi-bot commented on pull request #9538: [HUDI-6738] - Apply object filter before checkpoint batching in GcsEventsHoodieIncrSource

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9538: URL: https://github.com/apache/hudi/pull/9538#issuecomment-1699915450 ## CI report: * 1c87979b57e306970bcc95530f45586badcf0a6a Azure:

[jira] [Commented] (HUDI-6766) Fixing mysql debezium data loss

2023-08-30 Thread Sandeep Parwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760635#comment-17760635 ] Sandeep Parwal commented on HUDI-6766: -- PR with the fix: [https://github.com/apache/hudi/pull/9475]  

[jira] [Assigned] (HUDI-6795) Implement generation of record_positions for updates and deletes on write path

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6795: --- Assignee: Ethan Guo > Implement generation of record_positions for updates and deletes on write path

[hudi] branch master updated: [HUDI-6445] Fixing metrics to use IN-MEMORY type in tests (#9543)

2023-08-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 45d8290c80a [HUDI-6445] Fixing metrics to use

[GitHub] [hudi] lokesh-lingarajan-0310 commented on a diff in pull request #9473: [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit

2023-08-30 Thread via GitHub
lokesh-lingarajan-0310 commented on code in PR #9473: URL: https://github.com/apache/hudi/pull/9473#discussion_r1310856956 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo

[hudi] branch master updated: [HUDI-3727] Add metrics for async indexer (#9559)

2023-08-30 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a898dfd4152 [HUDI-3727] Add metrics for async

[GitHub] [hudi] yihua merged pull request #9559: [HUDI-3727] Add metrics for async indexer

2023-08-30 Thread via GitHub
yihua merged PR #9559: URL: https://github.com/apache/hudi/pull/9559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Comment Edited] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760631#comment-17760631 ] Ethan Guo edited comment on HUDI-6752 at 8/30/23 9:32 PM: -- I've create JIRA

[GitHub] [hudi] yihua commented on pull request #9559: [HUDI-3727] Add metrics for async indexer

2023-08-30 Thread via GitHub
yihua commented on PR #9559: URL: https://github.com/apache/hudi/pull/9559#issuecomment-1699873087 CI times out on the fourth job, which looks irrelevant. Merging this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Commented] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760631#comment-17760631 ] Ethan Guo commented on HUDI-6752: - I've create JIRA tickets in the corresponding EPICs and here's the

[jira] [Updated] (HUDI-6793) Support time-travel read in engine-agnostic FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6793: Priority: Blocker (was: Major) > Support time-travel read in engine-agnostic FileGroupReader >

[jira] [Updated] (HUDI-6788) Integrate FileGroupReader with MergeOnReadInputFormat for Flink

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6788: Priority: Blocker (was: Major) > Integrate FileGroupReader with MergeOnReadInputFormat for Flink >

[jira] [Updated] (HUDI-6800) Implement log writing with partial updates on the write path

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6800: Priority: Blocker (was: Major) > Implement log writing with partial updates on the write path >

[jira] [Updated] (HUDI-6787) Integrate FileGroupReader with HoodieMergeOnReadSnapshotReader and RealtimeCompactedRecordReader for Hive

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6787: Priority: Blocker (was: Major) > Integrate FileGroupReader with HoodieMergeOnReadSnapshotReader and >

[jira] [Updated] (HUDI-6799) Integrate FileGroupReader with merge handle on the write path

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6799: Priority: Blocker (was: Major) > Integrate FileGroupReader with merge handle on the write path >

[jira] [Updated] (HUDI-6790) Support incremental read in engine-agnostic FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6790: Priority: Blocker (was: Major) > Support incremental read in engine-agnostic FileGroupReader >

[jira] [Updated] (HUDI-6794) Support completion-time-based file slice in FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6794: Priority: Blocker (was: Major) > Support completion-time-based file slice in FileGroupReader >

[jira] [Updated] (HUDI-6791) Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark CDC Query

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6791: Priority: Blocker (was: Major) > Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark CDC

[jira] [Updated] (HUDI-6792) Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark Incremental Query

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6792: Priority: Blocker (was: Major) > Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark >

[jira] [Updated] (HUDI-6802) Use completion time in Spark FileIndex for listing

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6802: Priority: Blocker (was: Major) > Use completion time in Spark FileIndex for listing >

[jira] [Updated] (HUDI-6801) Implement merging of partial updates in FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6801: Priority: Blocker (was: Major) > Implement merging of partial updates in FileGroupReader >

[jira] [Updated] (HUDI-6789) Support CDC read in engine-agnostic FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6789: Priority: Blocker (was: Major) > Support CDC read in engine-agnostic FileGroupReader >

[jira] [Updated] (HUDI-6797) Implement position-based updates in FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6797: Priority: Blocker (was: Major) > Implement position-based updates in FileGroupReader >

[jira] [Updated] (HUDI-6795) Implement generation of record_positions for updates and deletes on write path

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6795: Priority: Blocker (was: Major) > Implement generation of record_positions for updates and deletes on write

[jira] [Updated] (HUDI-6796) Implement position-based deletes in FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6796: Priority: Blocker (was: Major) > Implement position-based deletes in FileGroupReader >

[jira] [Updated] (HUDI-6795) Implement generation of record_positions for updates and deletes on write path

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6795: Status: In Progress (was: Open) > Implement generation of record_positions for updates and deletes on

[jira] [Updated] (HUDI-6798) Implement event-time-based merging mode in FileGroupReader

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6798: Priority: Blocker (was: Major) > Implement event-time-based merging mode in FileGroupReader >

[jira] [Updated] (HUDI-6791) Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark CDC Query

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6791: Story Points: 4 (was: 3) > Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark CDC Query >

[jira] [Updated] (HUDI-6799) Integrate FileGroupReader with merge handle on the write path

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6799: Fix Version/s: 1.0.0 > Integrate FileGroupReader with merge handle on the write path >

[jira] [Updated] (HUDI-6785) Introduce an engine-agnostic FileGroupReader for snapshot read

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6785: Fix Version/s: 1.0.0 > Introduce an engine-agnostic FileGroupReader for snapshot read >

[jira] [Updated] (HUDI-6791) Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark CDC Query

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6791: Story Points: 3 > Integrate FileGroupReader with NewHoodieParquetFileFormat for Spark CDC Query >

[jira] [Closed] (HUDI-6752) Scope out the work for file group reading and writing with record merging in Spark

2023-08-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-6752. --- Resolution: Fixed > Scope out the work for file group reading and writing with record merging in > Spark >

[GitHub] [hudi] nsivabalan closed pull request #9533: [HUDI-6445] Fixing metrics in tests

2023-08-30 Thread via GitHub
nsivabalan closed pull request #9533: [HUDI-6445] Fixing metrics in tests URL: https://github.com/apache/hudi/pull/9533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[jira] [Assigned] (HUDI-6807) MoR Incremental count queries trigger full scan of files in table

2023-08-30 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-6807: - Assignee: sivabalan narayanan > MoR Incremental count queries trigger full scan

[jira] [Created] (HUDI-6807) MoR Incremental count queries trigger full scan of files in table

2023-08-30 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6807: --- Summary: MoR Incremental count queries trigger full scan of files in table Key: HUDI-6807 URL: https://issues.apache.org/jira/browse/HUDI-6807 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #9580: automatically create a database when using the flink catalog dfs mode

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9580: URL: https://github.com/apache/hudi/pull/9580#issuecomment-1699688410 ## CI report: * 5c8bfdbecf648b3633882ef37de0a18d31209a2e Azure:

[GitHub] [hudi] imrewang commented on issue #9513: [SUPPORT]Index Bootstrap deleted some snapshot data that has been batch-inserted into Hudi ?

2023-08-30 Thread via GitHub
imrewang commented on issue #9513: URL: https://github.com/apache/hudi/issues/9513#issuecomment-1699605979 Now there are `snapshot data` `1 1 1` and `2 2 2` I **delete** `1 1 1` now, and then **update** `2 update 2` (**strict order of delete first and then update**) result:

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1699688154 ## CI report: * aeac327c3cad812fea5e2bc01c07c1314bbf1838 UNKNOWN * 2554ca28ddffba3e8ffb64db090daf85ffae187b Azure:

[jira] [Created] (HUDI-6806) Support Spark 3.5.0

2023-08-30 Thread Shawn Chang (Jira)
Shawn Chang created HUDI-6806: - Summary: Support Spark 3.5.0 Key: HUDI-6806 URL: https://issues.apache.org/jira/browse/HUDI-6806 Project: Apache Hudi Issue Type: Improvement

[jira] [Updated] (HUDI-6806) Support Spark 3.5.0

2023-08-30 Thread Shawn Chang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shawn Chang updated HUDI-6806: -- Fix Version/s: 1.0.0 > Support Spark 3.5.0 > --- > > Key: HUDI-6806 >

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1699614311 ## CI report: * 53178b06b8a26034ad8be84e5e1ae3b23b57f7ea Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-30 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1699534535 ## CI report: * aeac327c3cad812fea5e2bc01c07c1314bbf1838 UNKNOWN * 390358e6f53821e8c19365974d4d1da9b2ee0e89 Azure:

  1   2   >