[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1696815226 ## CI report: * d3756f7364bc48e58c07c4b6818ad62e30e13fb0 Azure:

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
Zouxxyy commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1308223983 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + + + ${hive.groupid} + hive-exec +

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
Zouxxyy commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1308223983 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + + + ${hive.groupid} + hive-exec +

[GitHub] [hudi] the-other-tim-brown commented on issue #9355: [SUPPORT] Problem while reading from BQ tables which are synced on Hudi table

2023-08-28 Thread via GitHub
the-other-tim-brown commented on issue #9355: URL: https://github.com/apache/hudi/issues/9355#issuecomment-1696796546 @ranjanankur you have to build the snapshot from source since the final 0.14.0 release is not published yet. I would definitely start testing in a staging environment

[GitHub] [hudi] ranjanankur commented on issue #9355: [SUPPORT] Problem while reading from BQ tables which are synced on Hudi table

2023-08-28 Thread via GitHub
ranjanankur commented on issue #9355: URL: https://github.com/apache/hudi/issues/9355#issuecomment-1696793585 & We are not able file the `BIGQUERY_SYNC_USE_BQ_MANIFEST_FILE` in the `0.14.0-SNAPSHOT` version. We are able to see that this part of code present on GitHub. Was the code bundled

[GitHub] [hudi] danny0405 commented on issue #9119: [SUPPORT] ERROR BaseSparkCommitActionExecutor: Error upserting bucketType UPDATE for partition :13

2023-08-28 Thread via GitHub
danny0405 commented on issue #9119: URL: https://github.com/apache/hudi/issues/9119#issuecomment-1696783576 RC1 is out, guess we still have a RC2 there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] ranjanankur commented on issue #9355: [SUPPORT] Problem while reading from BQ tables which are synced on Hudi table

2023-08-28 Thread via GitHub
ranjanankur commented on issue #9355: URL: https://github.com/apache/hudi/issues/9355#issuecomment-1696782369 We can see this code in the source but @the-other-tim-brown will you please confirm the version of `hudi-gcp-hundle` that we should use & if we have to use `0.14.0-SNAPSHOT` then

[hudi] branch master updated: [MINOR] Modify return type description (#9479)

2023-08-28 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new af17ee72910 [MINOR] Modify return type

[GitHub] [hudi] danny0405 merged pull request #9479: [MINOR] Modify return type description

2023-08-28 Thread via GitHub
danny0405 merged PR #9479: URL: https://github.com/apache/hudi/pull/9479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] danny0405 commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
danny0405 commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1308198370 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + + + ${hive.groupid} + hive-exec +

[GitHub] [hudi] the-other-tim-brown commented on issue #9355: [SUPPORT] Problem while reading from BQ tables which are synced on Hudi table

2023-08-28 Thread via GitHub
the-other-tim-brown commented on issue #9355: URL: https://github.com/apache/hudi/issues/9355#issuecomment-1696768579 @ranjanankur You can pass that in as `props.setProperty(BIGQUERY_SYNC_USE_BQ_MANIFEST_FILE.key, "true")` in your code above. ([source for

[GitHub] [hudi] danny0405 commented on issue #9513: [SUPPORT]Index Bootstrap deleted some snapshot data that has been batch-inserted into Hudi ?

2023-08-28 Thread via GitHub
danny0405 commented on issue #9513: URL: https://github.com/apache/hudi/issues/9513#issuecomment-1696767874 @imrewang Did you execute the delete operation using the batch sql or streaming ingestion, it's the streaming ingestion right ? did you try snapshot queries using Flink engine? --

[GitHub] [hudi] ranjanankur commented on issue #9355: [SUPPORT] Problem while reading from BQ tables which are synced on Hudi table

2023-08-28 Thread via GitHub
ranjanankur commented on issue #9355: URL: https://github.com/apache/hudi/issues/9355#issuecomment-1696764242 Hi @the-other-tim-brown , @emkornfield In the above article it is mentioned to use the `use-bq-manifest-file` flag while running `BigQuerySyncTool` function to sync Hudi table

[GitHub] [hudi] guanziyue commented on a diff in pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with

2023-08-28 Thread via GitHub
guanziyue commented on code in PR #9553: URL: https://github.com/apache/hudi/pull/9553#discussion_r1308187945 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileWriteCallback.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] hudi-bot commented on pull request #9565: [HUDI-6725] Support efficient completion time queries on the timeline

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9565: URL: https://github.com/apache/hudi/pull/9565#issuecomment-1696734657 ## CI report: * a6763dd79b81e4abb7ce4dd925e7dd8f318cca0e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9565: [HUDI-6725] Support efficient completion time queries on the timeline

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9565: URL: https://github.com/apache/hudi/pull/9565#issuecomment-1696729681 ## CI report: * a6763dd79b81e4abb7ce4dd925e7dd8f318cca0e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1696725244 ## CI report: * 102f50cd61cff3fdc4347dca02e712854cc20fd8 Azure:

[GitHub] [hudi] danny0405 commented on a diff in pull request #9565: [HUDI-6725] Support efficient completion time queries on the timeline

2023-08-28 Thread via GitHub
danny0405 commented on code in PR #9565: URL: https://github.com/apache/hudi/pull/9565#discussion_r1308148250 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/timeline/CompletionTimeQueryView.java: ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache

[jira] [Updated] (HUDI-6725) Support efficient completion time queries on the timeline

2023-08-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6725: - Labels: pull-request-available (was: ) > Support efficient completion time queries on the

[GitHub] [hudi] danny0405 opened a new pull request, #9565: [HUDI-6725] Support efficient completion time queries on the timeline

2023-08-28 Thread via GitHub
danny0405 opened a new pull request, #9565: URL: https://github.com/apache/hudi/pull/9565 ### Change Logs Add a tool to query completion time efficiently on both active & archived timeline. ### Impact none ### Risk level (write none, low medium or high below)

[jira] [Updated] (HUDI-6772) Handle missing index metadata for keyed lookup reader

2023-08-28 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Liu updated HUDI-6772: -- Summary: Handle missing index metadata for keyed lookup reader (was: Handle missing index metadata ) > Handle

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
Zouxxyy commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1308145131 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + + + ${hive.groupid} + hive-exec +

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1696698785 ## CI report: * 102f50cd61cff3fdc4347dca02e712854cc20fd8 Azure:

[jira] [Created] (HUDI-6772) Handle missing index metadata

2023-08-28 Thread Lin Liu (Jira)
Lin Liu created HUDI-6772: - Summary: Handle missing index metadata Key: HUDI-6772 URL: https://issues.apache.org/jira/browse/HUDI-6772 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1696698609 ## CI report: * 150ae016d4947394638ceb38f675a1c687c94fcd Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1696693407 ## CI report: * 150ae016d4947394638ceb38f675a1c687c94fcd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3217: Description: h2. These are the gaps that we need to fill for the new record merging API * [P0][HUDI-6702]

[jira] [Commented] (HUDI-835) refactor HoodieMergeHandle into factory pattern

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759782#comment-17759782 ] Ethan Guo commented on HUDI-835: We have HoodieMergeHandleFactory to create HoodieMergeHandle. Closing

[jira] [Closed] (HUDI-835) refactor HoodieMergeHandle into factory pattern

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-835. -- Resolution: Fixed > refactor HoodieMergeHandle into factory pattern >

[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758833#comment-17758833 ] Vinoth Chandar edited comment on HUDI-1623 at 8/29/23 2:32 AM: --- On the

[jira] [Comment Edited] (HUDI-1623) Support start_commit_time & end_commit_times for serializable incremental pull

2023-08-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758833#comment-17758833 ] Vinoth Chandar edited comment on HUDI-1623 at 8/29/23 2:31 AM: --- On the

[jira] [Updated] (HUDI-6768) Revisit HoodieRecord design and how it affects e2e row writing

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6768: Priority: Blocker (was: Major) > Revisit HoodieRecord design and how it affects e2e row writing >

[jira] [Updated] (HUDI-6767) Simplify compatibility of HoodieRecord conversion

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6767: Priority: Blocker (was: Major) > Simplify compatibility of HoodieRecord conversion >

[jira] [Commented] (HUDI-6751) Scope out remaining work for the record merging API

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759779#comment-17759779 ] Ethan Guo commented on HUDI-6751: - I've updated the plan of remaining work in the description of the EPIC:

[jira] [Updated] (HUDI-6765) Add merge mode to allow differentiation of dedup logic

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6765: Priority: Blocker (was: Major) > Add merge mode to allow differentiation of dedup logic >

[jira] [Created] (HUDI-6771) Support Bloom Filter in Keyed Lookup Reader

2023-08-28 Thread Lin Liu (Jira)
Lin Liu created HUDI-6771: - Summary: Support Bloom Filter in Keyed Lookup Reader Key: HUDI-6771 URL: https://issues.apache.org/jira/browse/HUDI-6771 Project: Apache Hudi Issue Type: New Feature

[jira] [Created] (HUDI-6770) Improve on Keyed Lookup Reader

2023-08-28 Thread Lin Liu (Jira)
Lin Liu created HUDI-6770: - Summary: Improve on Keyed Lookup Reader Key: HUDI-6770 URL: https://issues.apache.org/jira/browse/HUDI-6770 Project: Apache Hudi Issue Type: New Feature

[jira] [Updated] (HUDI-4321) Fix Hudi to not write in Parquet legacy format

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4321: Priority: Major (was: Critical) > Fix Hudi to not write in Parquet legacy format >

[jira] [Updated] (HUDI-6702) Extend merge API to support all merging operations

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6702: Priority: Blocker (was: Major) > Extend merge API to support all merging operations >

[jira] [Created] (HUDI-6769) Integration test on Keyed Lookup Reader

2023-08-28 Thread Lin Liu (Jira)
Lin Liu created HUDI-6769: - Summary: Integration test on Keyed Lookup Reader Key: HUDI-6769 URL: https://issues.apache.org/jira/browse/HUDI-6769 Project: Apache Hudi Issue Type: New Feature

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3217: Description: These are the gaps that we need to fill for the new record merging API * [P0][HUDI-6702]

[jira] [Closed] (HUDI-3908) Profile MOR snapshot query flow

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-3908. --- Resolution: Done > Profile MOR snapshot query flow > --- > > Key:

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3217: Description: These are the gaps that we need to fill for the new record merging API * [P0]HUDI-6702 Extend

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3217: Description: These are the gaps that we need to fill for the new record merging API * [P0]HUDI-6702 Extend

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3217: Description: * [P0]HUDI-6702 Extend merge API to support all merging operations (inserts, updates and

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3217: Description: * [P0]HUDI-6702 Extend merge API to support all merging operations (inserts, updates and

[jira] [Updated] (HUDI-6725) Support efficient completion time queries on the timeline

2023-08-28 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6725: - Summary: Support efficient completion time queries on the timeline (was: Support fast completion time

[GitHub] [hudi] zealjoanna commented on issue #9557: [SUPPORT] CDC file clean not work

2023-08-28 Thread via GitHub
zealjoanna commented on issue #9557: URL: https://github.com/apache/hudi/issues/9557#issuecomment-1696667766 ![image](https://github.com/apache/hudi/assets/21325163/eff4cfbd-e140-4f93-bc60-5a072569fe1e) the cdc file contains the partition_path while the parquet file use the filename

[jira] [Updated] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-6712: - Reviewers: Vinoth Chandar > Implement optimized keyed lookup on parquet files >

[jira] [Comment Edited] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757157#comment-17757157 ] Vinoth Chandar edited comment on HUDI-6712 at 8/29/23 2:14 AM: --- Based on

[jira] [Updated] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-3217: - Description: Currently Hudi is biased t/w assumption of particular payload representation

[jira] [Assigned] (HUDI-3217) RFC-46: Optimize Record Payload handling

2023-08-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-3217: Assignee: Ethan Guo (was: Alexey Kudinkin) > RFC-46: Optimize Record Payload handling >

[hudi] branch master updated (e76dd102bca -> d924f181633)

2023-08-28 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from e76dd102bca [HUDI-4631] Adding retries to spark datasource writes on conflict failures (#6854) add d924f181633

[jira] [Updated] (HUDI-6751) Scope out remaining work for the record merging API

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6751: Status: Patch Available (was: In Progress) > Scope out remaining work for the record merging API >

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696654360 ## CI report: * aeac327c3cad812fea5e2bc01c07c1314bbf1838 UNKNOWN * d8e0ec93a2a03173e55030f5638c89dab676727e Azure:

[jira] [Closed] (HUDI-6539) New LSM tree style archived timeline

2023-08-28 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6539. Resolution: Fixed Fixed via master branch: d924f181633a08aa9124aa211fa16fd19d1f03df > New LSM tree style

[jira] [Created] (HUDI-6768) Revisit HoodieRecord design and how it affects e2e row writing

2023-08-28 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6768: --- Summary: Revisit HoodieRecord design and how it affects e2e row writing Key: HUDI-6768 URL: https://issues.apache.org/jira/browse/HUDI-6768 Project: Apache Hudi

[hudi] branch asf-site updated: [HUDI-6671][DOC] Add partition sql command (#9556)

2023-08-28 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 29bd207cd39 [HUDI-6671][DOC] Add partition

[GitHub] [hudi] stream2000 commented on pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-08-28 Thread via GitHub
stream2000 commented on PR #9558: URL: https://github.com/apache/hudi/pull/9558#issuecomment-1696650479 @leesf @jonvex Hi, could you please help review this pr? Thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] danny0405 merged pull request #9556: [HUDI-6671][DOC] Add partition sql command doc

2023-08-28 Thread via GitHub
danny0405 merged PR #9556: URL: https://github.com/apache/hudi/pull/9556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[jira] [Created] (HUDI-6767) Simplify compatibility of HoodieRecord conversion

2023-08-28 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6767: --- Summary: Simplify compatibility of HoodieRecord conversion Key: HUDI-6767 URL: https://issues.apache.org/jira/browse/HUDI-6767 Project: Apache Hudi Issue Type: Task

[jira] [Created] (HUDI-6766) Fixing mysql debezium data loss

2023-08-28 Thread Danny Chen (Jira)
Danny Chen created HUDI-6766: Summary: Fixing mysql debezium data loss Key: HUDI-6766 URL: https://issues.apache.org/jira/browse/HUDI-6766 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] vamshigv commented on a diff in pull request #9539: [HUDI-6726] Fix connection leaks related to file reader and iterator close

2023-08-28 Thread via GitHub
vamshigv commented on code in PR #9539: URL: https://github.com/apache/hudi/pull/9539#discussion_r1308102765 ## hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java: ## @@ -182,12 +182,8 @@ private static String getUserKeyFromCellKey(String

[GitHub] [hudi] danny0405 commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
danny0405 commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1308101795 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + + + ${hive.groupid} + hive-exec +

[jira] [Closed] (HUDI-5281) Rewrite HoodieSparkRecord with UnsafeRowWriter

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-5281. --- Resolution: Fixed > Rewrite HoodieSparkRecord with UnsafeRowWriter >

[jira] [Updated] (HUDI-5019) Remove these unnecessary newInstance invocations

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5019: Fix Version/s: 0.14.0 > Remove these unnecessary newInstance invocations >

[jira] [Assigned] (HUDI-5019) Remove these unnecessary newInstance invocations

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5019: --- Assignee: Danny Chen > Remove these unnecessary newInstance invocations >

[jira] [Closed] (HUDI-5019) Remove these unnecessary newInstance invocations

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-5019. --- Resolution: Fixed > Remove these unnecessary newInstance invocations >

[jira] [Updated] (HUDI-6765) Add merge mode to allow differentiation of dedup logic

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6765: Fix Version/s: 1.0.0 > Add merge mode to allow differentiation of dedup logic >

[jira] [Updated] (HUDI-6765) Add merge mode to allow differentiation of dedup logic

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6765: Description: The new merge API in HoodieRecordMerger can only differentiate merge logic based on

[jira] [Created] (HUDI-6765) Add merge mode to allow differentiation of dedup logic

2023-08-28 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6765: --- Summary: Add merge mode to allow differentiation of dedup logic Key: HUDI-6765 URL: https://issues.apache.org/jira/browse/HUDI-6765 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-6702) Extend merge API to support all merging operations

2023-08-28 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6702: Summary: Extend merge API to support all merging operations (was: Some capabilities of the original

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1696614469 ## CI report: * 102f50cd61cff3fdc4347dca02e712854cc20fd8 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9563: [minor] Fix AWS refactor bug by adding skipTableArchive arg

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9563: URL: https://github.com/apache/hudi/pull/9563#issuecomment-1696614452 ## CI report: * 7ada9ca3603914aba3e2bef6ecc25a739c3d57eb Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696614407 ## CI report: * 99b8895f68c9c80c64c2e6ee9e7cbfdff45cc6e0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9564: URL: https://github.com/apache/hudi/pull/9564#issuecomment-1696584018 ## CI report: * 102f50cd61cff3fdc4347dca02e712854cc20fd8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696583935 ## CI report: * 99b8895f68c9c80c64c2e6ee9e7cbfdff45cc6e0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696577352 ## CI report: * 99b8895f68c9c80c64c2e6ee9e7cbfdff45cc6e0 Azure:

[jira] [Updated] (HUDI-6712) Implement optimized keyed lookup on parquet files

2023-08-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6712: - Labels: pull-request-available (was: ) > Implement optimized keyed lookup on parquet files >

[GitHub] [hudi] linliu-code opened a new pull request, #9564: [HUDI-6712] Add Parquet file metadata loader

2023-08-28 Thread via GitHub
linliu-code opened a new pull request, #9564: URL: https://github.com/apache/hudi/pull/9564 ### Change Logs To optimize the keyed lookup queries for parquet files, the first step is to load the metadata of parquet file into memory efficiently. This metadata loader enables Hudi to

[GitHub] [hudi] jonvex commented on a diff in pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with ro

2023-08-28 Thread via GitHub
jonvex commented on code in PR #9553: URL: https://github.com/apache/hudi/pull/9553#discussion_r1308050904 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/BaseRollbackHelper.java: ## @@ -59,18 +71,20 @@ public class BaseRollbackHelper

[GitHub] [hudi] hudi-bot commented on pull request #9563: [minor] Fix AWS refactor bug by adding skipTableArchive arg

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9563: URL: https://github.com/apache/hudi/pull/9563#issuecomment-1696495306 ## CI report: * 7ada9ca3603914aba3e2bef6ecc25a739c3d57eb Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9563: [minor] Fix AWS refactor bug by adding skipTableArchive arg

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9563: URL: https://github.com/apache/hudi/pull/9563#issuecomment-1696487275 ## CI report: * 7ada9ca3603914aba3e2bef6ecc25a739c3d57eb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] the-other-tim-brown commented on pull request #9563: [minor] Fix AWS refactor bug by adding skipTableArchive arg

2023-08-28 Thread via GitHub
the-other-tim-brown commented on PR #9563: URL: https://github.com/apache/hudi/pull/9563#issuecomment-1696479555 Bug was introduced here: https://github.com/apache/hudi/pull/9347/files#diff-5a8c6aa8b66c5ed46ab2e0e0f353bc9c8fdf2e8a40eabd09a876da6a5ae19ebdL599 -- This is an

[GitHub] [hudi] the-other-tim-brown opened a new pull request, #9563: [minor] Fix AWS refactor bug by adding skipTableArchive arg

2023-08-28 Thread via GitHub
the-other-tim-brown opened a new pull request, #9563: URL: https://github.com/apache/hudi/pull/9563 ### Change Logs A line was missed while upgrading AWS dependencies. We are missing an arg to the update table call. ### Impact Fixes a bug ### Risk level (write

[GitHub] [hudi] hudi-bot commented on pull request #9539: [HUDI-6726] Fix connection leaks related to file reader and iterator close

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9539: URL: https://github.com/apache/hudi/pull/9539#issuecomment-1696478522 ## CI report: * 3727eef11f362ccd7036cab138ac230d5b8b6f02 Azure:

[GitHub] [hudi] guanziyue commented on a diff in pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with

2023-08-28 Thread via GitHub
guanziyue commented on code in PR #9553: URL: https://github.com/apache/hudi/pull/9553#discussion_r1307822743 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/MarkerBasedRollbackStrategy.java: ## @@ -75,64 +82,115 @@ public List

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696329247 ## CI report: * 99b8895f68c9c80c64c2e6ee9e7cbfdff45cc6e0 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1696329368 ## CI report: * 4cc84c2ecfbe2b8553f0675ff60aa4abebd8f315 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9561: [HUDI-6763] Optimize collect calls

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9561: URL: https://github.com/apache/hudi/pull/9561#issuecomment-1696232409 ## CI report: * e25b5976befb314bb8165c0ac460111ef2e76438 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9539: [HUDI-6726] Fix connection leaks related to file reader and iterator close

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9539: URL: https://github.com/apache/hudi/pull/9539#issuecomment-1696209449 ## CI report: * b27496a4fd4db02f1e23f12180497bb37bb18242 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9559: [HUDI-3727] Add metrics for async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9559: URL: https://github.com/apache/hudi/pull/9559#issuecomment-1696125659 ## CI report: * 95fa0e2b0806d20787cf74599404b03f729a05d7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9539: [HUDI-6726] Fix connection leaks related to file reader and iterator close

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9539: URL: https://github.com/apache/hudi/pull/9539#issuecomment-1696125421 ## CI report: * b27496a4fd4db02f1e23f12180497bb37bb18242 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9209: URL: https://github.com/apache/hudi/pull/9209#issuecomment-1696112913 ## CI report: * d688280b7fde224d75843893290db559bcc169f5 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9209: [HUDI-6539] New LSM tree style archived timeline

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9209: URL: https://github.com/apache/hudi/pull/9209#issuecomment-1696101255 ## CI report: * d688280b7fde224d75843893290db559bcc169f5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] bhasudha opened a new pull request, #9562: [DOCS] Update Record payload page

2023-08-28 Thread via GitHub
bhasudha opened a new pull request, #9562: URL: https://github.com/apache/hudi/pull/9562 ### Change Logs update record payload page ### Impact docs update ### Risk level (write none, low medium or high below) Low ### Documentation Update

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696045373 ## CI report: * a9744742e08f4d81b6b373857d002d339c7cf882 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696033060 ## CI report: * a9744742e08f4d81b6b373857d002d339c7cf882 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9558: URL: https://github.com/apache/hudi/pull/9558#issuecomment-1696020828 ## CI report: * ab46e341122b1fe70ebc85aa0b1a97204b2e7f89 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1696020770 ## CI report: * b4d6290f84232bd5216be195bebb562eba6ce1ba Azure:

  1   2   >