[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385019069 ## CI report: * 2f4ee14477c6868151f3d14eb1f3535d3eafb11d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1430

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385019241 ## CI report: * 7f5f3ef01829ff5ffb79543d2281bfc08e575c3e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] weimingdiit opened a new pull request, #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
weimingdiit opened a new pull request, #7684: URL: https://github.com/apache/hudi/pull/7684 ### Change Logs Exception message maybe can clearer when determine schema from the data files in bootstrap. ### Impact nothing ### Risk level (write none, low medium or hi

[jira] [Updated] (HUDI-5567) Modified to make bootstrapping exception message clearer

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5567: - Labels: pull-request-available (was: ) > Modified to make bootstrapping exception message clearer

[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385026602 ## CI report: * 7fa0b38ff13bce16a12b35a9f009b414854c9fe6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=143

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385026730 ## CI report: * 7f5f3ef01829ff5ffb79543d2281bfc08e575c3e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385026811 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385053453 > # Issue > Issue at hand: Clustering will be performed for inputGroups with only 1 fileSlice, which may cause unnecessary file re-writes and write amplifications should there be no

[GitHub] [hudi] loukey-lj opened a new pull request, #7685: [HUDI 5568]

2023-01-17 Thread GitBox
loukey-lj opened a new pull request, #7685: URL: https://github.com/apache/hudi/pull/7685 ### Change Logs writeClient.getHoodieTable().getFileSystemView() always return the local fileSystemView, should use writeClient. getHoodieTable(). getHoodieView() to determine the fileSy

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385054159 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] hangc0276 opened a new issue, #7686: [SUPPORT] Is there any way to delete records by specify one field value without selecting all the records out

2023-01-17 Thread GitBox
hangc0276 opened a new issue, #7686: URL: https://github.com/apache/hudi/issues/7686 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr

[jira] [Updated] (HUDI-5246) Improve validation for partition path

2023-01-17 Thread Hemanth Gowda (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Gowda updated HUDI-5246: Status: Open (was: In Progress) > Improve validation for partition path > -

[GitHub] [hudi] hudi-bot commented on pull request #7619: [MINOR] Optimizing schema validation in Metadata table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7619: URL: https://github.com/apache/hudi/pull/7619#issuecomment-1385115006 ## CI report: * dd59c7370a986b881a4f8e980915484f0c9021c3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385115523 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385125561 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 2fe0d6a4dd0fe655a6c0b7f9c7bd3889e91a84f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385127333 ## CI report: * e4aabbcc465e71d9184ad1ecb3a53690e98fc291 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568]

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385127424 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-5568) incorrect use of fileSystemView

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5568: - Labels: pull-request-available (was: ) > incorrect use of fileSystemView > -

[GitHub] [hudi] TengHuo commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
TengHuo commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072001434 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385139725 ## CI report: * e4aabbcc465e71d9184ad1ecb3a53690e98fc291 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385139873 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] trushev commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
trushev commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072027683 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [hudi] TengHuo commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
TengHuo commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072031352 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [hudi] boneanxs commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
boneanxs commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385170676 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[jira] [Updated] (HUDI-5565) Application restart may cause data lose when task parallelism is changed

2023-01-17 Thread lei w (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lei w updated HUDI-5565: Description: [HUDI-2084|https://github.com/apache/hudi/pull/3168] Resend the uncommitted write metadata when start u

[GitHub] [hudi] loukey-lj commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
loukey-lj commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385197027 hi @danny0405 could you please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] BalaMahesh opened a new pull request, #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
BalaMahesh opened a new pull request, #7687: URL: https://github.com/apache/hudi/pull/7687 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any perfo

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385270838 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1412

[GitHub] [hudi] BalaMahesh commented on issue #7595: [SUPPORT] Hudi Clean and Delta commits taking ~50 mins to finish frequently

2023-01-17 Thread GitBox
BalaMahesh commented on issue #7595: URL: https://github.com/apache/hudi/issues/7595#issuecomment-1385272668 > I guess we run into some performance issue when using BloomFilter index for mor table with metadata table disabled, thanks for the feedback, let me record this issue first for this

[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385273943 ## CI report: * a94ec9cf09ce55b684fa059ce1ede73bead0e991 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-17 Thread GitBox
hudi-bot commented on PR #7669: URL: https://github.com/apache/hudi/pull/7669#issuecomment-1385274376 ## CI report: * dae2ca6c5ab37f7865789823dae7ec3033c7b452 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #7677: [HUDI-5559] Support CDC for flink bounded source

2023-01-17 Thread GitBox
hudi-bot commented on PR #7677: URL: https://github.com/apache/hudi/pull/7677#issuecomment-1385274511 ## CI report: * c81f60f80a945dd2377e2fff4bc6207cc63ef576 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385280793 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1412

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385284533 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385289920 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1412

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385292589 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385494598 ## CI report: * 674eef810f4f188aaf0f505189674e454186e208 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] kazdy commented on pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
kazdy commented on PR #7640: URL: https://github.com/apache/hudi/pull/7640#issuecomment-1385496680 Thanks for the explanation, so it seems like key generator must be deterministic and there's no way around it. What I do with hudi datasets where I need a surrogate key is that I just g

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385506596 ## CI report: * 3fbb769fd595dea1f808be67627f97539d1eb945 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072338818 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -255,38 +255,35 @@ public List>>> getRecord return result; } - pr

[GitHub] [hudi] jonvex commented on a diff in pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
jonvex commented on code in PR #7576: URL: https://github.com/apache/hudi/pull/7576#discussion_r1072341098 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java: ## @@ -64,24 +81,32 @@ public static class Config { * @throws IOException

[GitHub] [hudi] xushiyan commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-17 Thread GitBox
xushiyan commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1385587792 the CI timeout issue happens on master and is unrelated to this PR itself. will land this first. CI issue will be addressed separately -- This is an automated message from the Apache Git

[GitHub] [hudi] xushiyan merged pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-17 Thread GitBox
xushiyan merged PR #6732: URL: https://github.com/apache/hudi/pull/6732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

[GitHub] [hudi] jonvex commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
jonvex commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1385607727 Rebased so it can be merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385625391 ## CI report: * ed783b49dbeec18cca93a9fe43f1c4f8ee9ae6dd UNKNOWN * a94346128d6b22fec262f74d7c2c9d7d342a0a3c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385629825 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072378625 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void testBasicAppendAndScanMultipleFiles(Exte

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385638702 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 2fe0d6a4dd0fe655a6c0b7f9c7bd3889e91a84f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1385640366 ## CI report: * cbbcde078cfd2653710905439861fd4188e06943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1429

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1385639846 ## CI report: * b5f77ec23bb8e1532542dcb219a2ef567a1601e5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1407

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385640144 ## CI report: * 2d99df06bfc13b1cc293ec6dd553d5c547405864 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1385651791 ## CI report: * b5f77ec23bb8e1532542dcb219a2ef567a1601e5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1407

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385652107 ## CI report: * 2d99df06bfc13b1cc293ec6dd553d5c547405864 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1385652305 ## CI report: * cbbcde078cfd2653710905439861fd4188e06943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1429

[GitHub] [hudi] jonvex commented on a diff in pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
jonvex commented on code in PR #7632: URL: https://github.com/apache/hudi/pull/7632#discussion_r1072449116 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala: ## @@ -117,7 +119,8 @@ class HoodieStreamingSink(sqlContext: SQLContext

[jira] [Updated] (HUDI-5555) Set class loader for parquet data block

2023-01-17 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-: -- Status: Patch Available (was: In Progress) > Set class loader for parquet data block >

[GitHub] [hudi] alexeykudinkin commented on issue #7643: [SUPPORT] Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread GitBox
alexeykudinkin commented on issue #7643: URL: https://github.com/apache/hudi/issues/7643#issuecomment-1385743502 @BruceKellan thanks for the detailed context! This is very helpful cc @yihua -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385755454 ## CI report: * 18e390314ee0744e0f6a23d1293f3b4338750af3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385766203 ## CI report: * 2d99df06bfc13b1cc293ec6dd553d5c547405864 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] mahesh2247 commented on issue #3431: [SUPPORT] Failed to upsert for commit time

2023-01-17 Thread GitBox
mahesh2247 commented on issue #3431: URL: https://github.com/apache/hudi/issues/3431#issuecomment-1385766563 Hello , trying to write a glue job script for reflecting CDC delete . Insert and update are working fine. Kindly help ``` import sys from awsglue.transforms import * from

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7681: [HUDI-5535] Support any record key generation along w/ any partition path generation

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7681: URL: https://github.com/apache/hudi/pull/7681#discussion_r1072524628 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/AutoRecordKeyGenerator.java: ## @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [hudi] mahesh2247 opened a new issue, #7688: [SUPPORT] Trying to write a glue job script for reflecting CDC delete . while Insert and update are working fine. Kindly help

2023-01-17 Thread GitBox
mahesh2247 opened a new issue, #7688: URL: https://github.com/apache/hudi/issues/7688 ``` import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.sql.session import SparkSession from pyspark.context import SparkContext from awsgl

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385776547 ## CI report: * d40771c52302ad4a78d4e05f57ca3a7dd900ac98 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=143

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385786237 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072550590 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7490: [HUDI-5407][HUDI-5408] Fixing rollback in MDT to be eager

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7490: URL: https://github.com/apache/hudi/pull/7490#discussion_r1072558109 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDMetadataWriteClient.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
the-other-tim-brown commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072575334 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software F

[jira] [Closed] (HUDI-4148) Preparations and client for hudi table manager service

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-4148. Reviewers: Raymond Xu Resolution: Fixed > Preparations and client for hudi table manager service >

[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385988025 ## CI report: * a94ec9cf09ce55b684fa059ce1ede73bead0e991 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385988494 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[jira] [Created] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5569: - Summary: Files written by first commit/delta commit if it failed is detected as valid data files Key: HUDI-5569 URL: https://issues.apache.org/jira/browse/HUDI-5569

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Description: We have an method in HoodieFileGroup which detects whether a file group is

[jira] [Assigned] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5569: - Assignee: Jonathan Vexler > Files written by first commit/delta commit if it fail

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Fix Version/s: 0.13.0 > Files written by first commit/delta commit if it failed is detec

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Sprint: 0.13.0 Final Sprint 2 > Files written by first commit/delta commit if it failed

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #7632: URL: https://github.com/apache/hudi/pull/7632#discussion_r1072765472 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -455,6 +455,15 @@ object DataSourceWriteOptions { + "Thi

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1386090331 ## CI report: * f7b2c025ed416ea8607b2e6dcc116415f114f87b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] yihua commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
yihua commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386112654 Hi @soumilshah1995 would you mind creating an AWS support issue for this? That will accelerate the resolution from AWS Athena. -- This is an automated message from the Apache Git Servi

[GitHub] [hudi] soumilshah1995 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
soumilshah1995 commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386115733 Sure i will tell my company sysops to create support ticket :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] yihua commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
yihua commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386123258 > Sure i will tell my company sysops to create support ticket :D Appreciate that! Let us know the AWS support ticket number once it's filed. cc @umehrot2 -- This is an automat

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7423: [HUDI-5384][Stacked on 7528] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7423: URL: https://github.com/apache/hudi/pull/7423#discussion_r1072877999 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed t

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072896769 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void testBasicAppendAndScanMultipleFiles(

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386201200 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386209381 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386215962 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1386230596 ## CI report: * 8dc8184d6fbafc72835bf52f85075e2a8288061e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072995895 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7582: URL: https://github.com/apache/hudi/pull/7582#discussion_r1072998596 ## hudi-common/src/main/java/org/apache/hudi/common/util/queue/DisruptorMessageQueue.java: ## @@ -60,6 +61,10 @@ public long size() { @Override public void

[GitHub] [hudi] With-winds opened a new issue, #7689: [SUPPORT] PriorityBasedFileSystemView: Got error running preferred function. Trying secondary

2023-01-17 Thread GitBox
With-winds opened a new issue, #7689: URL: https://github.com/apache/hudi/issues/7689 **Describe the problem you faced** When trying to write to existing COW table using HoodieDeltaStreamer, an error occurred in the Java Spark application. **To Reproduce** **Expected beh

[GitHub] [hudi] yihua opened a new pull request, #7690: [HUDI-5485] Add File System View API for batch listing and improve savepoint performance with metadata table

2023-01-17 Thread GitBox
yihua opened a new pull request, #7690: URL: https://github.com/apache/hudi/pull/7690 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance i

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5485: - Labels: pull-request-available (was: ) > Improve performance of savepoint with MDT >

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5323: Status: Patch Available (was: In Progress) > Decouple virtual key with writing bloom filters to parquet fil

[jira] [Updated] (HUDI-5319) NPE in Bloom Filter Index

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5319: Status: Patch Available (was: In Progress) > NPE in Bloom Filter Index > - > >

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Status: Patch Available (was: In Progress) > Improve performance of savepoint with MDT > --

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386356546 ## CI report: * 08642ac9be198fdf55f02260253f81a0b457bcad Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[jira] [Updated] (HUDI-5463) Apply rollback commits from data table as rollbacks in MDT instead of Delta commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5463: -- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint) > Apply

[jira] [Updated] (HUDI-5463) Apply rollback commits from data table as rollbacks in MDT instead of Delta commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5463: -- Sprint: 0.13.0 Final Sprint (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2) > Apply

[jira] [Updated] (HUDI-4937) Fix HoodieTable injecting HoodieBackedTableMetadata not reusing underlying MT readers

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4937: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5408: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13

[jira] [Updated] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5464: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13

  1   2   3   >