[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072995895 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1386230596 ## CI report: * 8dc8184d6fbafc72835bf52f85075e2a8288061e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386215962 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386209381 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386201200 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1437

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072896769 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void testBasicAppendAndScanMultipleFiles(

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7423: [HUDI-5384][Stacked on 7528] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7423: URL: https://github.com/apache/hudi/pull/7423#discussion_r1072877999 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed t

[GitHub] [hudi] yihua commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
yihua commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386123258 > Sure i will tell my company sysops to create support ticket :D Appreciate that! Let us know the AWS support ticket number once it's filed. cc @umehrot2 -- This is an automat

[GitHub] [hudi] soumilshah1995 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
soumilshah1995 commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386115733 Sure i will tell my company sysops to create support ticket :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] yihua commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
yihua commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386112654 Hi @soumilshah1995 would you mind creating an AWS support issue for this? That will accelerate the resolution from AWS Athena. -- This is an automated message from the Apache Git Servi

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1386090331 ## CI report: * f7b2c025ed416ea8607b2e6dcc116415f114f87b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #7632: URL: https://github.com/apache/hudi/pull/7632#discussion_r1072765472 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -455,6 +455,15 @@ object DataSourceWriteOptions { + "Thi

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Sprint: 0.13.0 Final Sprint 2 > Files written by first commit/delta commit if it failed

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Fix Version/s: 0.13.0 > Files written by first commit/delta commit if it failed is detec

[jira] [Assigned] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5569: - Assignee: Jonathan Vexler > Files written by first commit/delta commit if it fail

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Description: We have an method in HoodieFileGroup which detects whether a file group is

[jira] [Created] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5569: - Summary: Files written by first commit/delta commit if it failed is detected as valid data files Key: HUDI-5569 URL: https://issues.apache.org/jira/browse/HUDI-5569

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385988494 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385988025 ## CI report: * a94ec9cf09ce55b684fa059ce1ede73bead0e991 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[jira] [Closed] (HUDI-4148) Preparations and client for hudi table manager service

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-4148. Reviewers: Raymond Xu Resolution: Fixed > Preparations and client for hudi table manager service >

[GitHub] [hudi] the-other-tim-brown commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
the-other-tim-brown commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072575334 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software F

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7490: [HUDI-5407][HUDI-5408] Fixing rollback in MDT to be eager

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7490: URL: https://github.com/apache/hudi/pull/7490#discussion_r1072558109 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDMetadataWriteClient.java: ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072550590 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385786237 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385776547 ## CI report: * d40771c52302ad4a78d4e05f57ca3a7dd900ac98 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=143

[GitHub] [hudi] mahesh2247 opened a new issue, #7688: [SUPPORT] Trying to write a glue job script for reflecting CDC delete . while Insert and update are working fine. Kindly help

2023-01-17 Thread GitBox
mahesh2247 opened a new issue, #7688: URL: https://github.com/apache/hudi/issues/7688 ``` import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.sql.session import SparkSession from pyspark.context import SparkContext from awsgl

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7681: [HUDI-5535] Support any record key generation along w/ any partition path generation

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7681: URL: https://github.com/apache/hudi/pull/7681#discussion_r1072524628 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/AutoRecordKeyGenerator.java: ## @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [hudi] mahesh2247 commented on issue #3431: [SUPPORT] Failed to upsert for commit time

2023-01-17 Thread GitBox
mahesh2247 commented on issue #3431: URL: https://github.com/apache/hudi/issues/3431#issuecomment-1385766563 Hello , trying to write a glue job script for reflecting CDC delete . Insert and update are working fine. Kindly help ``` import sys from awsglue.transforms import * from

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385766203 ## CI report: * 2d99df06bfc13b1cc293ec6dd553d5c547405864 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385755454 ## CI report: * 18e390314ee0744e0f6a23d1293f3b4338750af3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] alexeykudinkin commented on issue #7643: [SUPPORT] Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread GitBox
alexeykudinkin commented on issue #7643: URL: https://github.com/apache/hudi/issues/7643#issuecomment-1385743502 @BruceKellan thanks for the detailed context! This is very helpful cc @yihua -- This is an automated message from the Apache Git Service. To respond to the message, plea

[jira] [Updated] (HUDI-5555) Set class loader for parquet data block

2023-01-17 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-: -- Status: Patch Available (was: In Progress) > Set class loader for parquet data block >

[GitHub] [hudi] jonvex commented on a diff in pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
jonvex commented on code in PR #7632: URL: https://github.com/apache/hudi/pull/7632#discussion_r1072449116 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala: ## @@ -117,7 +119,8 @@ class HoodieStreamingSink(sqlContext: SQLContext

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1385652305 ## CI report: * cbbcde078cfd2653710905439861fd4188e06943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1429

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385652107 ## CI report: * 2d99df06bfc13b1cc293ec6dd553d5c547405864 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1385651791 ## CI report: * b5f77ec23bb8e1532542dcb219a2ef567a1601e5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1407

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1385640144 ## CI report: * 2d99df06bfc13b1cc293ec6dd553d5c547405864 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1423

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1385639846 ## CI report: * b5f77ec23bb8e1532542dcb219a2ef567a1601e5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1407

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1385640366 ## CI report: * cbbcde078cfd2653710905439861fd4188e06943 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1429

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385638702 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 2fe0d6a4dd0fe655a6c0b7f9c7bd3889e91a84f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072378625 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void testBasicAppendAndScanMultipleFiles(Exte

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385629825 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385625391 ## CI report: * ed783b49dbeec18cca93a9fe43f1c4f8ee9ae6dd UNKNOWN * a94346128d6b22fec262f74d7c2c9d7d342a0a3c Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] jonvex commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
jonvex commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1385607727 Rebased so it can be merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [hudi] xushiyan merged pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-17 Thread GitBox
xushiyan merged PR #6732: URL: https://github.com/apache/hudi/pull/6732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

[GitHub] [hudi] xushiyan commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-17 Thread GitBox
xushiyan commented on PR #6732: URL: https://github.com/apache/hudi/pull/6732#issuecomment-1385587792 the CI timeout issue happens on master and is unrelated to this PR itself. will land this first. CI issue will be addressed separately -- This is an automated message from the Apache Git

[GitHub] [hudi] jonvex commented on a diff in pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
jonvex commented on code in PR #7576: URL: https://github.com/apache/hudi/pull/7576#discussion_r1072341098 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java: ## @@ -64,24 +81,32 @@ public static class Config { * @throws IOException

[GitHub] [hudi] nsivabalan commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072338818 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java: ## @@ -255,38 +255,35 @@ public List>>> getRecord return result; } - pr

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385506596 ## CI report: * 3fbb769fd595dea1f808be67627f97539d1eb945 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] kazdy commented on pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
kazdy commented on PR #7640: URL: https://github.com/apache/hudi/pull/7640#issuecomment-1385496680 Thanks for the explanation, so it seems like key generator must be deterministic and there's no way around it. What I do with hudi datasets where I need a surrogate key is that I just g

[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385494598 ## CI report: * 674eef810f4f188aaf0f505189674e454186e208 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385292589 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385289920 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1412

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385284533 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385280793 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1412

[GitHub] [hudi] hudi-bot commented on pull request #7677: [HUDI-5559] Support CDC for flink bounded source

2023-01-17 Thread GitBox
hudi-bot commented on PR #7677: URL: https://github.com/apache/hudi/pull/7677#issuecomment-1385274511 ## CI report: * c81f60f80a945dd2377e2fff4bc6207cc63ef576 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-17 Thread GitBox
hudi-bot commented on PR #7669: URL: https://github.com/apache/hudi/pull/7669#issuecomment-1385274376 ## CI report: * dae2ca6c5ab37f7865789823dae7ec3033c7b452 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385273943 ## CI report: * a94ec9cf09ce55b684fa059ce1ede73bead0e991 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] BalaMahesh commented on issue #7595: [SUPPORT] Hudi Clean and Delta commits taking ~50 mins to finish frequently

2023-01-17 Thread GitBox
BalaMahesh commented on issue #7595: URL: https://github.com/apache/hudi/issues/7595#issuecomment-1385272668 > I guess we run into some performance issue when using BloomFilter index for mor table with metadata table disabled, thanks for the feedback, let me record this issue first for this

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385270838 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1412

[GitHub] [hudi] BalaMahesh opened a new pull request, #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
BalaMahesh opened a new pull request, #7687: URL: https://github.com/apache/hudi/pull/7687 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any perfo

[GitHub] [hudi] loukey-lj commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
loukey-lj commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385197027 hi @danny0405 could you please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Updated] (HUDI-5565) Application restart may cause data lose when task parallelism is changed

2023-01-17 Thread lei w (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lei w updated HUDI-5565: Description: [HUDI-2084|https://github.com/apache/hudi/pull/3168] Resend the uncommitted write metadata when start u

[GitHub] [hudi] boneanxs commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
boneanxs commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385170676 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] TengHuo commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
TengHuo commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072031352 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [hudi] trushev commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
trushev commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072027683 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385139873 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385139725 ## CI report: * e4aabbcc465e71d9184ad1ecb3a53690e98fc291 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] TengHuo commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
TengHuo commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072001434 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[jira] [Updated] (HUDI-5568) incorrect use of fileSystemView

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5568: - Labels: pull-request-available (was: ) > incorrect use of fileSystemView > -

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568]

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385127424 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385127333 ## CI report: * e4aabbcc465e71d9184ad1ecb3a53690e98fc291 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385125561 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 2fe0d6a4dd0fe655a6c0b7f9c7bd3889e91a84f2 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385115523 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1436

[GitHub] [hudi] hudi-bot commented on pull request #7619: [MINOR] Optimizing schema validation in Metadata table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7619: URL: https://github.com/apache/hudi/pull/7619#issuecomment-1385115006 ## CI report: * dd59c7370a986b881a4f8e980915484f0c9021c3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1435

[jira] [Updated] (HUDI-5246) Improve validation for partition path

2023-01-17 Thread Hemanth Gowda (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Gowda updated HUDI-5246: Status: Open (was: In Progress) > Improve validation for partition path > -

[GitHub] [hudi] hangc0276 opened a new issue, #7686: [SUPPORT] Is there any way to delete records by specify one field value without selecting all the records out

2023-01-17 Thread GitBox
hangc0276 opened a new issue, #7686: URL: https://github.com/apache/hudi/issues/7686 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385054159 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] loukey-lj opened a new pull request, #7685: [HUDI 5568]

2023-01-17 Thread GitBox
loukey-lj opened a new pull request, #7685: URL: https://github.com/apache/hudi/pull/7685 ### Change Logs writeClient.getHoodieTable().getFileSystemView() always return the local fileSystemView, should use writeClient. getHoodieTable(). getHoodieView() to determine the fileSy

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385053453 > # Issue > Issue at hand: Clustering will be performed for inputGroups with only 1 fileSlice, which may cause unnecessary file re-writes and write amplifications should there be no

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385026811 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385026730 ## CI report: * 7f5f3ef01829ff5ffb79543d2281bfc08e575c3e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385026602 ## CI report: * 7fa0b38ff13bce16a12b35a9f009b414854c9fe6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=143

[jira] [Updated] (HUDI-5567) Modified to make bootstrapping exception message clearer

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5567: - Labels: pull-request-available (was: ) > Modified to make bootstrapping exception message clearer

[GitHub] [hudi] weimingdiit opened a new pull request, #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
weimingdiit opened a new pull request, #7684: URL: https://github.com/apache/hudi/pull/7684 ### Change Logs Exception message maybe can clearer when determine schema from the data files in bootstrap. ### Impact nothing ### Risk level (write none, low medium or hi

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385019241 ## CI report: * 7f5f3ef01829ff5ffb79543d2281bfc08e575c3e Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1434

[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385019069 ## CI report: * 2f4ee14477c6868151f3d14eb1f3535d3eafb11d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1430

<    1   2   3