[GitHub] [hudi] bhasudha opened a new pull request, #9562: [DOCS] Update Record payload page

2023-08-28 Thread via GitHub
bhasudha opened a new pull request, #9562: URL: https://github.com/apache/hudi/pull/9562 ### Change Logs update record payload page ### Impact docs update ### Risk level (write none, low medium or high below) Low ### Documentation Update _Descr

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696045373 ## CI report: * a9744742e08f4d81b6b373857d002d339c7cf882 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1696033060 ## CI report: * a9744742e08f4d81b6b373857d002d339c7cf882 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] hudi-bot commented on pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9558: URL: https://github.com/apache/hudi/pull/9558#issuecomment-1696020828 ## CI report: * ab46e341122b1fe70ebc85aa0b1a97204b2e7f89 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1952

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1696020770 ## CI report: * b4d6290f84232bd5216be195bebb562eba6ce1ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[jira] [Updated] (HUDI-6764) Failed to read S3 multi-path for hudi datasource after 0.12.0

2023-08-28 Thread Aditya Goenka (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka updated HUDI-6764: Priority: Critical (was: Major) > Failed to read S3 multi-path for hudi datasource after 0.12.0 > -

[jira] [Created] (HUDI-6764) Failed to read S3 multi-path for hudi datasource after 0.12.0

2023-08-28 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-6764: --- Summary: Failed to read S3 multi-path for hudi datasource after 0.12.0 Key: HUDI-6764 URL: https://issues.apache.org/jira/browse/HUDI-6764 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1695955459 ## CI report: * b4d6290f84232bd5216be195bebb562eba6ce1ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] jonvex commented on a diff in pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with ro

2023-08-28 Thread via GitHub
jonvex commented on code in PR #9553: URL: https://github.com/apache/hudi/pull/9553#discussion_r1307614341 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java: ## @@ -1023,14 +1023,19 @@ public void update(HoodieRollbackM

[GitHub] [hudi] codope commented on a diff in pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with ro

2023-08-28 Thread via GitHub
codope commented on code in PR #9553: URL: https://github.com/apache/hudi/pull/9553#discussion_r1307430567 ## hudi-common/src/main/java/org/apache/hudi/common/fs/StorageSchemes.java: ## @@ -25,69 +25,74 @@ */ public enum StorageSchemes { // Local filesystem - FILE("file",

[GitHub] [hudi] hudi-bot commented on pull request #9545: [HUDI-6758] Detecting and skipping Spurious log blocks with MOR reads

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9545: URL: https://github.com/apache/hudi/pull/9545#issuecomment-1695942382 ## CI report: * f12633fa6d50cf56b3e2036c2fa418fbf137c7fb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695942120 ## CI report: * 150ae016d4947394638ceb38f675a1c687c94fcd Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] codope closed issue #9319: [SUPPORT] how to use HiveSyncConfig instead of hive configs in DataSourceWriteOptions object

2023-08-28 Thread via GitHub
codope closed issue #9319: [SUPPORT] how to use HiveSyncConfig instead of hive configs in DataSourceWriteOptions object URL: https://github.com/apache/hudi/issues/9319 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [hudi] ad1happy2go commented on issue #9319: [SUPPORT] how to use HiveSyncConfig instead of hive configs in DataSourceWriteOptions object

2023-08-28 Thread via GitHub
ad1happy2go commented on issue #9319: URL: https://github.com/apache/hudi/issues/9319#issuecomment-1695879255 @zlinsc Closing out this. Please reopen in case of any more issues/queries. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] ad1happy2go commented on issue #9307: [SUPPORT] Hudi writing using `Upsert` and `InsertAppend` fails in version 0.13.0 and 0.13.1 because of the method validateTableConfig - Hudi versi

2023-08-28 Thread via GitHub
ad1happy2go commented on issue #9307: URL: https://github.com/apache/hudi/issues/9307#issuecomment-1695877368 @idrismike Sorry for the delay here. This was a known issue. Can you try with this patch - https://github.com/apache/hudi/pull/8869 -- This is an automated message from the Apache

[GitHub] [hudi] hudi-bot commented on pull request #9561: [HUDI-6763] Optimize collect calls

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9561: URL: https://github.com/apache/hudi/pull/9561#issuecomment-1695874220 ## CI report: * e25b5976befb314bb8165c0ac460111ef2e76438 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1952

[GitHub] [hudi] ad1happy2go commented on issue #9298: [SUPPORT] MOR Hudi failed to upsert after upgrading EC2 instance.

2023-08-28 Thread via GitHub
ad1happy2go commented on issue #9298: URL: https://github.com/apache/hudi/issues/9298#issuecomment-1695873692 @PhantomHunt Closing out this issue as anyway it is unrelated to hudi. Please reopen in case you have any issues on the same. -- This is an automated message from the Apache Git S

[GitHub] [hudi] hudi-bot commented on pull request #9561: [HUDI-6763] Optimize collect calls

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9561: URL: https://github.com/apache/hudi/pull/9561#issuecomment-1695858497 ## CI report: * e25b5976befb314bb8165c0ac460111ef2e76438 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6763: - Labels: pull-request-available (was: ) > WriteStats are extracted twice in BaseSparkCommitActionE

[GitHub] [hudi] the-other-tim-brown opened a new pull request, #9561: [HUDI-6763] Optimize collect calls

2023-08-28 Thread via GitHub
the-other-tim-brown opened a new pull request, #9561: URL: https://github.com/apache/hudi/pull/9561 ### Change Logs Updates the code to only call the collect method once if possible. ### Impact Reduces overhead of generating the commit stats ### Risk level (write n

[jira] [Assigned] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-28 Thread Timothy Brown (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Brown reassigned HUDI-6763: --- Assignee: Timothy Brown > WriteStats are extracted twice in BaseSparkCommitActionExecutor > -

[jira] [Created] (HUDI-6763) WriteStats are extracted twice in BaseSparkCommitActionExecutor

2023-08-28 Thread Timothy Brown (Jira)
Timothy Brown created HUDI-6763: --- Summary: WriteStats are extracted twice in BaseSparkCommitActionExecutor Key: HUDI-6763 URL: https://issues.apache.org/jira/browse/HUDI-6763 Project: Apache Hudi

[GitHub] [hudi] hudi-bot commented on pull request #9559: [HUDI-3727] Add metrics for async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9559: URL: https://github.com/apache/hudi/pull/9559#issuecomment-1695669297 ## CI report: * 95fa0e2b0806d20787cf74599404b03f729a05d7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1952

[GitHub] [hudi] hudi-bot commented on pull request #9559: [HUDI-3727] Add metrics for async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9559: URL: https://github.com/apache/hudi/pull/9559#issuecomment-1695654333 ## CI report: * 95fa0e2b0806d20787cf74599404b03f729a05d7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9558: URL: https://github.com/apache/hudi/pull/9558#issuecomment-1695640880 ## CI report: * ab46e341122b1fe70ebc85aa0b1a97204b2e7f89 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1952

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1695640811 ## CI report: * a9744742e08f4d81b6b373857d002d339c7cf882 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] bhasudha commented on a diff in pull request #9560: [DOCS] Update cleaning page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9560: URL: https://github.com/apache/hudi/pull/9560#discussion_r1307368862 ## website/docs/hoodie_cleaner.md: ## @@ -1,44 +1,82 @@ --- title: Cleaning toc: true +toc_min_heading_level: 2 +toc_max_heading_level: 4 --- +## Background +Cleaning

[GitHub] [hudi] bhasudha commented on pull request #9560: [DOCS] Update cleaning page

2023-08-28 Thread via GitHub
bhasudha commented on PR #9560: URL: https://github.com/apache/hudi/pull/9560#issuecomment-1695622815 Tested locally! ![Screenshot 2023-08-28 at 5 35 26 AM](https://github.com/apache/hudi/assets/2179254/efd59c73-df10-41cf-bdbf-173c4bdb1948) ![Screenshot 2023-08-28 at 5 35 43 AM](h

[GitHub] [hudi] bhasudha opened a new pull request, #9560: [DOCS] Update cleaning page

2023-08-28 Thread via GitHub
bhasudha opened a new pull request, #9560: URL: https://github.com/apache/hudi/pull/9560 ### Change Logs update docs page for cleaning with inline configs ### Impact docs change ### Risk level (write none, low medium or high below) low ### Documentati

[jira] [Updated] (HUDI-3727) Add metrics for async indexer

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3727: -- Reviewers: Ethan Guo > Add metrics for async indexer > - > >

[jira] [Updated] (HUDI-3727) Add metrics for async indexer

2023-08-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-3727: - Labels: pull-request-available (was: ) > Add metrics for async indexer >

[jira] [Updated] (HUDI-3727) Add metrics for async indexer

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3727: -- Status: Patch Available (was: In Progress) > Add metrics for async indexer > --

[GitHub] [hudi] codope opened a new pull request, #9559: [HUDI-3727] Add metrics for async indexer

2023-08-28 Thread via GitHub
codope opened a new pull request, #9559: URL: https://github.com/apache/hudi/pull/9559 ### Change Logs Added index initialization and catchup task latency metrics. ### Impact If `hoodie.metadata.metrics.enable` is true with async indexer, then the aove metrics will be ca

[jira] [Updated] (HUDI-3727) Add metrics for async indexer

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3727: -- Status: In Progress (was: Open) > Add metrics for async indexer > - > >

[GitHub] [hudi] hudi-bot commented on pull request #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9558: URL: https://github.com/apache/hudi/pull/9558#issuecomment-1695575691 ## CI report: * ab46e341122b1fe70ebc85aa0b1a97204b2e7f89 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695575375 ## CI report: * e345a37f960b967aa3b293f013832b11d01ef528 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=195

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1695563105 ## CI report: * b4d6290f84232bd5216be195bebb562eba6ce1ba Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695562745 ## CI report: * cead68b8487fd9cbc42913114deba7b835b73d16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307310436 ## website/docs/concurrency_control.md: ## @@ -2,105 +2,126 @@ title: "Concurrency Control" summary: In this page, we will discuss how to perform concurrent writes to

[GitHub] [hudi] harsh1231 commented on a diff in pull request #9545: [HUDI-6758] Detecting and skipping Spurious log blocks with MOR reads

2023-08-28 Thread via GitHub
harsh1231 commented on code in PR #9545: URL: https://github.com/apache/hudi/pull/9545#discussion_r1307306746 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -129,6 +129,9 @@ public class HoodieAppendHandle extends HoodieWriteHa

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307298317 ## website/docs/concurrency_control.md: ## @@ -186,18 +221,32 @@ A Hudi Streamer job can then be triggered as follows: --source-class org.apache.hudi.utilities.sources

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307296531 ## website/docs/concurrency_control.md: ## @@ -186,18 +221,32 @@ A Hudi Streamer job can then be triggered as follows: --source-class org.apache.hudi.utilities.sources

[jira] [Closed] (HUDI-4631) Enhance retries for failed writes w/ write conflicts in a multi writer scenarios

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-4631. - Resolution: Fixed > Enhance retries for failed writes w/ write conflicts in a multi writer > scenarios >

[hudi] branch master updated: [HUDI-4631] Adding retries to spark datasource writes on conflict failures (#6854)

2023-08-28 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e76dd102bca [HUDI-4631] Adding retries to spark da

[GitHub] [hudi] codope merged pull request #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:

2023-08-28 Thread via GitHub
codope merged PR #6854: URL: https://github.com/apache/hudi/pull/6854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.or

[jira] [Closed] (HUDI-3756) Clean up indexing APIs in write client

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-3756. - Resolution: Done The APIs are in good shape now. {code:java} /** * Schedules INDEX action. * * @param p

[jira] [Updated] (HUDI-6481) Implement MultipleServiceRunner to run services on multiple tables through single job

2023-08-28 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6481: - Labels: pull-request-available (was: ) > Implement MultipleServiceRunner to run services on multi

[GitHub] [hudi] stream2000 opened a new pull request, #9558: [HUDI-6481] Support run multi tables services in a single spark job

2023-08-28 Thread via GitHub
stream2000 opened a new pull request, #9558: URL: https://github.com/apache/hudi/pull/9558 ### Change Logs Now we have HoodieMultiTableDeltaStreamer using spark to ingest multi tables into hudi, and we can also ingest multi tables using a single flink in some platforms like alicloud

[GitHub] [hudi] hudi-bot commented on pull request #9545: [HUDI-6758] Detecting and skipping Spurious log blocks with MOR reads

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9545: URL: https://github.com/apache/hudi/pull/9545#issuecomment-1695503247 ## CI report: * 1085328010c46aa18b939804a5d96d1a47514fa9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1949

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695503041 ## CI report: * cead68b8487fd9cbc42913114deba7b835b73d16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] hudi-bot commented on pull request #9545: [HUDI-6758] Detecting and skipping Spurious log blocks with MOR reads

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9545: URL: https://github.com/apache/hudi/pull/9545#issuecomment-1695492139 ## CI report: * 1085328010c46aa18b939804a5d96d1a47514fa9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1949

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307279703 ## website/docs/concurrency_control.md: ## @@ -186,18 +221,32 @@ A Hudi Streamer job can then be triggered as follows: --source-class org.apache.hudi.utilities.sources

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695491904 ## CI report: * cead68b8487fd9cbc42913114deba7b835b73d16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307279080 ## website/docs/concurrency_control.md: ## @@ -162,7 +198,6 @@ inputDF.write.format("hudi") .option("hoodie.write.concurrency.mode", "optimistic_concurrency_cont

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307277531 ## website/docs/concurrency_control.md: ## @@ -2,105 +2,126 @@ title: "Concurrency Control" summary: In this page, we will discuss how to perform concurrent writes to

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695480696 ## CI report: * cead68b8487fd9cbc42913114deba7b835b73d16 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307271216 ## website/docs/concurrency_control.md: ## @@ -2,105 +2,126 @@ title: "Concurrency Control" summary: In this page, we will discuss how to perform concurrent writes to

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307271043 ## website/docs/concurrency_control.md: ## @@ -2,105 +2,126 @@ title: "Concurrency Control" summary: In this page, we will discuss how to perform concurrent writes to

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307267448 ## website/docs/concurrency_control.md: ## @@ -2,105 +2,126 @@ title: "Concurrency Control" summary: In this page, we will discuss how to perform concurrent writes to

[GitHub] [hudi] bhasudha commented on a diff in pull request #9372: [DOCS]Update Concurrency page

2023-08-28 Thread via GitHub
bhasudha commented on code in PR #9372: URL: https://github.com/apache/hudi/pull/9372#discussion_r1307260385 ## website/docs/concurrency_control.md: ## @@ -2,105 +2,126 @@ title: "Concurrency Control" summary: In this page, we will discuss how to perform concurrent writes to

[jira] [Comment Edited] (HUDI-3786) how to deduce what MDT partitions to update on the write path w/ async indeing

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759533#comment-17759533 ] Sagar Sumit edited comment on HUDI-3786 at 8/28/23 10:40 AM: -

[jira] [Created] (HUDI-6762) Remove usages of MetadataRecordsGenerationParams

2023-08-28 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-6762: - Summary: Remove usages of MetadataRecordsGenerationParams Key: HUDI-6762 URL: https://issues.apache.org/jira/browse/HUDI-6762 Project: Apache Hudi Issue Type: Task

[jira] [Closed] (HUDI-3786) how to deduce what MDT partitions to update on the write path w/ async indeing

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit closed HUDI-3786. - Resolution: Implemented > how to deduce what MDT partitions to update on the write path w/ async indeing >

[jira] [Updated] (HUDI-3786) how to deduce what MDT partitions to update on the write path w/ async indeing

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-3786: -- Priority: Minor (was: Critical) > how to deduce what MDT partitions to update on the write path w/ asyn

[jira] [Commented] (HUDI-3786) how to deduce what MDT partitions to update on the write path w/ async indeing

2023-08-28 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759533#comment-17759533 ] Sagar Sumit commented on HUDI-3786: --- Writer is already checking table config in `Hoodie

[GitHub] [hudi] harsh1231 commented on a diff in pull request #9545: [HUDI-6758] Detecting and skipping Spurious log blocks with MOR reads

2023-08-28 Thread via GitHub
harsh1231 commented on code in PR #9545: URL: https://github.com/apache/hudi/pull/9545#discussion_r1307247611 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieAppendHandle.java: ## @@ -129,6 +129,9 @@ public class HoodieAppendHandle extends HoodieWriteHa

[GitHub] [hudi] codope commented on a diff in pull request #9406: [DOCS] Update Metadata table and metadata indexing related pages

2023-08-28 Thread via GitHub
codope commented on code in PR #9406: URL: https://github.com/apache/hudi/pull/9406#discussion_r1307238377 ## website/docs/metadata.md: ## @@ -3,80 +3,173 @@ title: Metadata Table keywords: [ hudi, metadata, S3 file listings] --- -## Motivation for a Metadata Table +## Metad

[hudi] branch asf-site updated: [DOCS] Update Metadata table and metadata indexing related pages (#9406)

2023-08-28 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new e597bd0dd33 [DOCS] Update Metadata table and m

[GitHub] [hudi] codope merged pull request #9406: [DOCS] Update Metadata table and metadata indexing related pages

2023-08-28 Thread via GitHub
codope merged PR #9406: URL: https://github.com/apache/hudi/pull/9406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.or

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1695412199 ## CI report: * 4be2f9e0d54c1173fc40ffb0a219c39e0ef7b80a Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=195

[GitHub] [hudi] hudi-bot commented on pull request #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:

2023-08-28 Thread via GitHub
hudi-bot commented on PR #6854: URL: https://github.com/apache/hudi/pull/6854#issuecomment-1695407187 ## CI report: * 4f234c69035938bbc50beaeb36382fcd6dea5c43 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] yihao-tcf commented on issue #9471: [SUPPORT] When using Deltasteamer JdbcSource to extract data, there are issues with data loss and slow query of source side data

2023-08-28 Thread via GitHub
yihao-tcf commented on issue #9471: URL: https://github.com/apache/hudi/issues/9471#issuecomment-1695404040 > @yihao-tcf Thanks for raising this. I do understand your concern. Ideally run-once(without --continous) also should take care of incremental fetch according to source limit. We need

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1695399419 ## CI report: * 924b4a0b5d6ac727defd75f23e9d3e91afd78b06 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1950

[GitHub] [hudi] zealjoanna opened a new issue, #9557: [SUPPORT] CDC file clean not work

2023-08-28 Thread via GitHub
zealjoanna opened a new issue, #9557: URL: https://github.com/apache/hudi/issues/9557 **Describe the problem you faced** i'm using CDC read for COW table, i want to keep last two commit by setting hoodie.cleaner.commits.retained = 1 then i found the parquet file cleaned as as I

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1695306273 ## CI report: * 91529a3dc8cbe6ab5683aed634863f199d969d77 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1950

[GitHub] [hudi] wecharyu opened a new pull request, #9556: [HUDI-6671][DOC] Add partition sql command doc

2023-08-28 Thread via GitHub
wecharyu opened a new pull request, #9556: URL: https://github.com/apache/hudi/pull/9556 ### Change Logs Add doc for add partition sql command. HUDI-6671 ### Impact Low ### Risk level (write none, low medium or high below) Low ### Documentation Update

[GitHub] [hudi] hudi-bot commented on pull request #9475: [MINOR] fixing mysql debezium data loss

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9475: URL: https://github.com/apache/hudi/pull/9475#issuecomment-1695305816 ## CI report: * 8e97c31e973e2a9443c48f34bba9793b599a3584 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1951

[GitHub] [hudi] silly-carbon opened a new issue, #9555: [SUPPORT] MERGE INTO throws exception [[id] cannot be entirely null or empty.] even if [id] is not null

2023-08-28 Thread via GitHub
silly-carbon opened a new issue, #9555: URL: https://github.com/apache/hudi/issues/9555 **Describe the problem you faced** We have two tables, simplified as below: ``` create table temp_db.merge_target ( id int, name string, price double, ts bigint ) us

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1695292811 ## CI report: * 91529a3dc8cbe6ab5683aed634863f199d969d77 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1950

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1695292655 ## CI report: * 924b4a0b5d6ac727defd75f23e9d3e91afd78b06 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1950

[GitHub] [hudi] nsivabalan commented on a diff in pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist wit

2023-08-28 Thread via GitHub
nsivabalan commented on code in PR #9553: URL: https://github.com/apache/hudi/pull/9553#discussion_r1307114400 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java: ## @@ -104,6 +129,162 @@ public boolean commit(String instantTime, Java

[GitHub] [hudi] Zouxxyy commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
Zouxxyy commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1695249433 @danny0405 > Can you elaborate a little more what the purpose of this change? See updated Change Logs. > Does it has risk of breaking the compatibility for low version H

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
Zouxxyy commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1307090034 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + Review Comment: If this is not added, the compilation will fail i

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695239149 ## CI report: * c6a6123c6f76c99cfbfbf06b7a1bdddbff143aa9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1943

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
Zouxxyy commented on code in PR #9554: URL: https://github.com/apache/hudi/pull/9554#discussion_r1307090034 ## hudi-spark-datasource/hudi-spark-common/pom.xml: ## @@ -228,6 +228,19 @@ test + Review Comment: If this is not added, the compilation will fail i

[GitHub] [hudi] stream2000 commented on a diff in pull request #9515: [HUDI-2141] Support flink compaction metrics

2023-08-28 Thread via GitHub
stream2000 commented on code in PR #9515: URL: https://github.com/apache/hudi/pull/9515#discussion_r1307089689 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/metrics/FlinkWriteMetrics.java: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] [hudi] hudi-bot commented on pull request #9554: [HUDI-6760] Add SelfDescribingInputFormatInterface for hive FileInput…

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9554: URL: https://github.com/apache/hudi/pull/9554#issuecomment-1695226959 ## CI report: * 91529a3dc8cbe6ab5683aed634863f199d969d77 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1950

[GitHub] [hudi] hudi-bot commented on pull request #9553: [HUDI-1517][HUDI-6758][HUDI-6761] Adding support for per-logfile marker to track all log files added by a commit and to assist with rollbacks

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9553: URL: https://github.com/apache/hudi/pull/9553#issuecomment-1695226896 ## CI report: * 924b4a0b5d6ac727defd75f23e9d3e91afd78b06 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1950

[GitHub] [hudi] hudi-bot commented on pull request #9517: [HUDI-6708] Support record level indexing with async indexer

2023-08-28 Thread via GitHub
hudi-bot commented on PR #9517: URL: https://github.com/apache/hudi/pull/9517#issuecomment-1695226431 ## CI report: * c6a6123c6f76c99cfbfbf06b7a1bdddbff143aa9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1943

[GitHub] [hudi] JoshuaZhuCN closed issue #9418: [SUPPORT] Hudi table does not support Spark SQL's cache table syntax

2023-08-28 Thread via GitHub
JoshuaZhuCN closed issue #9418: [SUPPORT] Hudi table does not support Spark SQL's cache table syntax URL: https://github.com/apache/hudi/issues/9418 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[jira] [Created] (HUDI-6761) Fix rollbacks with MDT for MOR data table with log files

2023-08-28 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-6761: - Summary: Fix rollbacks with MDT for MOR data table with log files Key: HUDI-6761 URL: https://issues.apache.org/jira/browse/HUDI-6761 Project: Apache Hudi

[jira] [Assigned] (HUDI-6761) Fix rollbacks with MDT for MOR data table with log files

2023-08-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-6761: - Assignee: sivabalan narayanan > Fix rollbacks with MDT for MOR data table with lo

[GitHub] [hudi] hudi-bot commented on pull request #6854: [HUDI-4631] Adding retries to spark datasource writes on conflict failures:

2023-08-28 Thread via GitHub
hudi-bot commented on PR #6854: URL: https://github.com/apache/hudi/pull/6854#issuecomment-1695146173 ## CI report: * d823908199b0aed70defea93544f66a95602a329 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1245

<    1   2