Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1853396088 @ad1happy2go Which logs would you like to see? Also - after more playing around with the configs, I discovered the below: ``` #

Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-12 Thread via GitHub
xuzifu666 commented on code in PR #10297: URL: https://github.com/apache/hudi/pull/10297#discussion_r1424914507 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java: ## @@ -165,10 +165,7 @@ protected void doWrite(HoodieRecord record, Schema

Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-12 Thread via GitHub
xuzifu666 commented on code in PR #10297: URL: https://github.com/apache/hudi/pull/10297#discussion_r1424914507 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java: ## @@ -165,10 +165,7 @@ protected void doWrite(HoodieRecord record, Schema

Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-12 Thread via GitHub
xuzifu666 commented on code in PR #10297: URL: https://github.com/apache/hudi/pull/10297#discussion_r1424914507 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java: ## @@ -165,10 +165,7 @@ protected void doWrite(HoodieRecord record, Schema

Re: [PR] [HUDI-7208] Do writing stage should shutdown with error when insert failed to reduce user execute time and show error details [hudi]

2023-12-12 Thread via GitHub
boneanxs commented on code in PR #10297: URL: https://github.com/apache/hudi/pull/10297#discussion_r1424913288 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieCreateHandle.java: ## @@ -165,10 +165,7 @@ protected void doWrite(HoodieRecord record, Schema

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853348151 ## CI report: * 521312007abecf8876dd917a5df05d0acf978b64 UNKNOWN * fba9c6ebe24cf83c31b4c19d40a4f7409e22d207 Azure:

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424875356 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/IncrementalQueryAnalyzer.java: ## @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424874743 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/IncrementalQueryAnalyzer.java: ## @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424872867 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java: ## @@ -175,42 +190,109 @@ public Option getCompletionTime(String

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424873530 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/IncrementalQueryAnalyzer.java: ## @@ -0,0 +1,425 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853221442 ## CI report: * 521312007abecf8876dd917a5df05d0acf978b64 UNKNOWN * 0c6882d2f26c5c11fcacc1864d73c3f214f03770 Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853217080 ## CI report: * 521312007abecf8876dd917a5df05d0acf978b64 UNKNOWN * 0c6882d2f26c5c11fcacc1864d73c3f214f03770 Azure:

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1853217050 ## CI report: * 79da2586916d604900592995b283ce281b0ef2ae Azure:

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
yihua commented on code in PR #10304: URL: https://github.com/apache/hudi/pull/10304#discussion_r1424808466 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala: ## @@ -234,19 +232,15 @@ class

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1853212464 ## CI report: * 79da2586916d604900592995b283ce281b0ef2ae UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853212511 ## CI report: * 521312007abecf8876dd917a5df05d0acf978b64 UNKNOWN * 0c6882d2f26c5c11fcacc1864d73c3f214f03770 Azure:

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424806427 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java: ## @@ -175,42 +190,109 @@ public Option getCompletionTime(String

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424806038 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java: ## @@ -175,42 +190,109 @@ public Option getCompletionTime(String

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1424804615 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstantTimeGenerator.java: ## @@ -108,6 +108,9 @@ public static String

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
yihua commented on code in PR #10304: URL: https://github.com/apache/hudi/pull/10304#discussion_r1424802397 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala: ## @@ -235,15 +236,11 @@ object DefaultSource { Option(schema)

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
the-other-tim-brown commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424792303 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache

Re: [PR] Incoming batch schema is not compatible with the table's one #9980 [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on PR #10308: URL: https://github.com/apache/hudi/pull/10308#issuecomment-1853188317 Can we write a UT for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [HUDI-7132] Data may be lost for flink task failure [hudi]

2023-12-12 Thread via GitHub
danny0405 commented on PR #10312: URL: https://github.com/apache/hudi/pull/10312#issuecomment-1853186628 > @danny0405 @cuibo01 Read through the JIRA ticket. While I understand how the state of the TM and JM can cause the potential data loss, I am still not very sure how the TM and JM

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
nsivabalan commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424789136 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853183093 ## CI report: * 521312007abecf8876dd917a5df05d0acf978b64 UNKNOWN * 8b1344e8ea1a4bc12259219002c346b2f06a61fc Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853173079 ## CI report: * 521312007abecf8876dd917a5df05d0acf978b64 UNKNOWN * 7cf9cdee554175792a3933577caffefe23ddd3e8 Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
the-other-tim-brown commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424779997 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache

Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-12 Thread via GitHub
lei-su-awx commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1853172037 @ad1happy2go I tried to only read files under that partition using spark(spark.readStream), but an error was thrown: no .hoodie file exists in the partition path, and I found

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
the-other-tim-brown commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424776759 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
the-other-tim-brown commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424774498 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
the-other-tim-brown commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424774805 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -351,23 +351,15 @@ private Pair>

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
xushiyan commented on code in PR #10307: URL: https://github.com/apache/hudi/pull/10307#discussion_r1424773047 ## hudi-client/hudi-client-common/src/test/java/org/apache/hudi/table/action/TestCleanPlanner.java: ## @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7226] Fix clean-by-hour policy for race condition [hudi]

2023-12-12 Thread via GitHub
xushiyan closed pull request #10321: [HUDI-7226] Fix clean-by-hour policy for race condition URL: https://github.com/apache/hudi/pull/10321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log [hudi]

2023-12-12 Thread via GitHub
xuzifu666 closed pull request #10314: [HUDI-7224] HoodieSparkSqlWriter metasync success or not show details messages log URL: https://github.com/apache/hudi/pull/10314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [HUDI-7190] Fix nested columns vectorized read for spark33+ legacy formats [hudi]

2023-12-12 Thread via GitHub
stream2000 commented on PR #10265: URL: https://github.com/apache/hudi/pull/10265#issuecomment-1853150157 > @stream2000 : Just checking if you are still working on the tests ? Sorry for the late reply, I was busy with other stuff. Will fix the test ASAP. -- This is an automated

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853127784 ## CI report: * 50dd31f58f1a82b9252eb5d273d933acacb4fbfb Azure:

Re: [I] [SUPPORT] hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer [hudi]

2023-12-12 Thread via GitHub
young138120 commented on issue #10320: URL: https://github.com/apache/hudi/issues/10320#issuecomment-1853107479 > @young138120 Somehow the config you passed `spark.serializer` is not being set to spark conf although your code looks okay. can you try spark.conf.get("spark.serializer")

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
jonvex commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1853105133 @yihua all tests passing, including azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [HUDI-7226] Fix clean-by-hour policy for race condition [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10321: URL: https://github.com/apache/hudi/pull/10321#issuecomment-1853083718 ## CI report: * beb7d827c906c84624b1905000401d4494e622e6 Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853077896 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

Re: [I] [SUPPORT] Assistance Needed with Hudi Delta Streamer (org.apache.hudi.utilities.exception.HoodieSchemaFetchException: Error reading source schema from registry) [hudi]

2023-12-12 Thread via GitHub
soumilshah1995 commented on issue #10174: URL: https://github.com/apache/hudi/issues/10174#issuecomment-1853066621 Youtube Video https://youtu.be/FSpt4jSH_O0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853043356 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853037256 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1853018021 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

(hudi) branch master updated: [HUDI-7225] Correcting spelling errors or annotations with non-standard spelling (#10317)

2023-12-12 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9d9a9a30207 [HUDI-7225] Correcting spelling

Re: [PR] [HUDI-7225] Correcting spelling errors or annotations with non-standa… [hudi]

2023-12-12 Thread via GitHub
bvaradar merged PR #10317: URL: https://github.com/apache/hudi/pull/10317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7184] Add IncrementalQueryAnalyzer for completion time based in… [hudi]

2023-12-12 Thread via GitHub
vinothchandar commented on code in PR #10255: URL: https://github.com/apache/hudi/pull/10255#discussion_r1420410913 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/InstantRange.java: ## @@ -22,9 +22,12 @@ import org.apache.hudi.common.util.ValidationUtils;

Re: [PR] [HUDI-7226] Fix clean-by-hour policy for race condition [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10321: URL: https://github.com/apache/hudi/pull/10321#issuecomment-1852938645 ## CI report: * beb7d827c906c84624b1905000401d4494e622e6 Azure:

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1852938565 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

Re: [PR] [HUDI-7226] Fix clean-by-hour policy for race condition [hudi]

2023-12-12 Thread via GitHub
the-other-tim-brown commented on PR #10321: URL: https://github.com/apache/hudi/pull/10321#issuecomment-1852932475 This is covered in this PR https://github.com/apache/hudi/pull/10307 which also adds test cases -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] [HUDI-7226] Fix clean-by-hour policy for race condition [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10321: URL: https://github.com/apache/hudi/pull/10321#issuecomment-1852931090 ## CI report: * beb7d827c906c84624b1905000401d4494e622e6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1852923098 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

[jira] [Updated] (HUDI-7226) Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain

2023-12-12 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7226: - Labels: pull-request-available (was: ) > Clean by hour does not respect

[PR] [HUDI-7226] Fix clean-by-hour policy for race condition [hudi]

2023-12-12 Thread via GitHub
xushiyan opened a new pull request, #10321: URL: https://github.com/apache/hudi/pull/10321 ### Change Logs `lastVersionBeforeEarliestCommitToRetain` is not honored by `KEEP_LATEST_BY_HOURS` policy. This essentially makes cleaner to remove an eligible file slice when it becomes

[jira] [Assigned] (HUDI-7226) Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain

2023-12-12 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-7226: Assignee: Raymond Xu > Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain >

[jira] [Updated] (HUDI-7226) Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain

2023-12-12 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-7226: - Description:

[jira] [Updated] (HUDI-7226) Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain

2023-12-12 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-7226: - Description:

Re: [I] [SUPPORT] Assistance Needed with Hudi Delta Streamer (org.apache.hudi.utilities.exception.HoodieSchemaFetchException: Error reading source schema from registry) [hudi]

2023-12-12 Thread via GitHub
soumilshah1995 closed issue #10174: [SUPPORT] Assistance Needed with Hudi Delta Streamer (org.apache.hudi.utilities.exception.HoodieSchemaFetchException: Error reading source schema from registry) URL: https://github.com/apache/hudi/issues/10174 -- This is an automated message from the

Re: [I] [SUPPORT] Assistance Needed with Hudi Delta Streamer (org.apache.hudi.utilities.exception.HoodieSchemaFetchException: Error reading source schema from registry) [hudi]

2023-12-12 Thread via GitHub
soumilshah1995 commented on issue #10174: URL: https://github.com/apache/hudi/issues/10174#issuecomment-1852893439 you are savior :D love you thanks a lot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [HUDI-7223] Cleaner KEEP_LATEST_BY_HOURS should retain latest commit before earliest commit to retain [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10307: URL: https://github.com/apache/hudi/pull/10307#issuecomment-1852871432 ## CI report: * 8b67cc8faf3a4e76866bed27c67ab8687eff5c40 Azure:

(hudi) branch master updated (47ad41575de -> c007d642289)

2023-12-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 47ad41575de [HUDI-7176] Add file group reader test framework (#10263) add c007d642289 [MINOR] Clean some imports

Re: [PR] [MINOR] Clean some imports for some files [hudi]

2023-12-12 Thread via GitHub
yihua merged PR #10305: URL: https://github.com/apache/hudi/pull/10305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [MINOR] Clean some imports for some files [hudi]

2023-12-12 Thread via GitHub
yihua commented on PR #10305: URL: https://github.com/apache/hudi/pull/10305#issuecomment-1852849293 CI is green. https://github.com/apache/hudi/assets/2497195/809d0ace-aaf3-4d66-a398-7416ff5b52a8;> -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] [SUPPORT] Clean action failure triggers an exception while trying to check whether metadata is a table [hudi]

2023-12-12 Thread via GitHub
shubhamn21 commented on issue #10127: URL: https://github.com/apache/hudi/issues/10127#issuecomment-1852839698 Disabling the clean action by setting `hoodie.clean.automatic` as `false` has helped for now. I'll be planning to create a daemon that can clean the cold metadata in parallel but

(hudi) branch master updated: [HUDI-7176] Add file group reader test framework (#10263)

2023-12-12 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 47ad41575de [HUDI-7176] Add file group reader test

Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-12 Thread via GitHub
yihua merged PR #10263: URL: https://github.com/apache/hudi/pull/10263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-12 Thread via GitHub
yihua commented on code in PR #10263: URL: https://github.com/apache/hudi/pull/10263#discussion_r1424578340 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/DataGenerationPlan.java: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] [HUDI-7200] Fix bugs of Avro record merger for event time merging [hudi]

2023-12-12 Thread via GitHub
linliu-code closed pull request #10286: [HUDI-7200] Fix bugs of Avro record merger for event time merging URL: https://github.com/apache/hudi/pull/10286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [HUDI-7200] Fix bugs of Avro record merger for event time merging [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on PR #10286: URL: https://github.com/apache/hudi/pull/10286#issuecomment-1852793324 After talked with @yihua , we decided to use HoodieAvroRecord with EmptyHoodieRecordPayload to indicate a delete record; so this PR can be dropped for now. We may need to revisit this

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1852765884 ## CI report: * 79c9af943ac8928216e8245752517f20893b0b42 UNKNOWN * 79da2586916d604900592995b283ce281b0ef2ae Azure:

Re: [PR] [HUDI-7200] Fix bugs of Avro record merger for event time merging [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on code in PR #10286: URL: https://github.com/apache/hudi/pull/10286#discussion_r1424512510 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java: ## @@ -59,6 +59,13 @@ private Option combineAndGetUpdateValue(HoodieRecord

Re: [PR] [HUDI-7200] Fix bugs of Avro record merger for event time merging [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on code in PR #10286: URL: https://github.com/apache/hudi/pull/10286#discussion_r1424512086 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java: ## @@ -59,6 +59,13 @@ private Option combineAndGetUpdateValue(HoodieRecord

Re: [PR] [HUDI-7200] Fix bugs of Avro record merger for event time merging [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on code in PR #10286: URL: https://github.com/apache/hudi/pull/10286#discussion_r1424510797 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodieAvroRecordMerger.java: ## @@ -59,6 +59,13 @@ private Option combineAndGetUpdateValue(HoodieRecord

[jira] [Created] (HUDI-7227) Enable completion time for File Group Reader

2023-12-12 Thread Lin Liu (Jira)
Lin Liu created HUDI-7227: - Summary: Enable completion time for File Group Reader Key: HUDI-7227 URL: https://issues.apache.org/jira/browse/HUDI-7227 Project: Apache Hudi Issue Type: Task

Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on code in PR #10263: URL: https://github.com/apache/hudi/pull/10263#discussion_r1424492464 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieFileSliceTestUtils.java: ## @@ -0,0 +1,440 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on PR #10263: URL: https://github.com/apache/hudi/pull/10263#issuecomment-1852689024 @yihua, I have addressed your comments. My overall plan is to land this PR, and use another PR to further simplify this framework and integrate more with other existing modules. How

Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on code in PR #10263: URL: https://github.com/apache/hudi/pull/10263#discussion_r1424490970 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/DataGenerationPlan.java: ## @@ -0,0 +1,126 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-7176] Add file group reader test framework [hudi]

2023-12-12 Thread via GitHub
linliu-code commented on code in PR #10263: URL: https://github.com/apache/hudi/pull/10263#discussion_r1424490035 ## hudi-common/src/test/java/org/apache/hudi/common/testutils/reader/HoodieTestReaderContext.java: ## @@ -0,0 +1,163 @@ +/* + * Licensed to the Apache Software

Re: [PR] [HUDI-6154] Introduced retry while reading hoodie.properties to deal with parallel updates. [hudi]

2023-12-12 Thread via GitHub
CTTY commented on code in PR #8609: URL: https://github.com/apache/hudi/pull/8609#discussion_r1424474203 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -334,22 +337,43 @@ public HoodieTableConfig() { super(); } - private void

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1852594104 ## CI report: * 79c9af943ac8928216e8245752517f20893b0b42 UNKNOWN * 7ce0e45df128a45407b2747a6d1004036e0d3ee8 Azure:

[jira] [Created] (HUDI-7226) Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain

2023-12-12 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-7226: Summary: Clean by hour does not respect lastVersionBeforeEarliestCommitToRetain Key: HUDI-7226 URL: https://issues.apache.org/jira/browse/HUDI-7226 Project: Apache Hudi

[jira] [Commented] (HUDI-7222) Fix the loose Scala style check

2023-12-12 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795879#comment-17795879 ] Lin Liu commented on HUDI-7222: --- ScalaStyle has been outdated since no updates for this library since 2019.

Re: [I] [SUPPORT] how to config hudi table TTL in S3? The table_meta can be separated into a directory? [hudi]

2023-12-12 Thread via GitHub
ad1happy2go commented on issue #10316: URL: https://github.com/apache/hudi/issues/10316#issuecomment-1852546640 May be a good idea but i guess we may have already explored. Adding @nsivabalan @yihua @danny0405 @codope -- This is an automated message from the Apache Git Service. To

Re: [I] [SUPPORT] Reuse table configuration between Spark Writes and HoodieStreamer [hudi]

2023-12-12 Thread via GitHub
ad1happy2go commented on issue #10319: URL: https://github.com/apache/hudi/issues/10319#issuecomment-1852544773 Did you tried with --hoodie-conf ? Will try to dig into it more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852540846 Can you provide us the logs to look into it more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1852532591 ## CI report: * 79c9af943ac8928216e8245752517f20893b0b42 UNKNOWN * 7ce0e45df128a45407b2747a6d1004036e0d3ee8 Azure:

Re: [PR] [HUDI-6613] implement inmemory file index to allow for glob paths [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10062: URL: https://github.com/apache/hudi/pull/10062#issuecomment-1852531827 ## CI report: * 601f0d68199d5ba31d441ca79c69f3ff3bdbb3a7 Azure:

Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852522867 @ad1happy2go Something else interesting to note. If I manually create the DB ``` database_name = "michael_test" # Create the database spark.sql(f"CREATE DATABASE IF

[jira] [Commented] (HUDI-7222) Fix the loose Scala style check

2023-12-12 Thread Lin Liu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795862#comment-17795862 ] Lin Liu commented on HUDI-7222: --- Meanwhile, unused import check is not implemented in current scala style

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1852496889 ## CI report: * 79c9af943ac8928216e8245752517f20893b0b42 UNKNOWN * 7ce0e45df128a45407b2747a6d1004036e0d3ee8 Azure:

Re: [PR] Incoming batch schema is not compatible with the table's one #9980 [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10308: URL: https://github.com/apache/hudi/pull/10308#issuecomment-1852496996 ## CI report: * 14d5465e2e85b66ff4404a5c9b46f19e9c9a0e73 Azure:

Re: [PR] [HUDI-6613] implement inmemory file index to allow for glob paths [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10062: URL: https://github.com/apache/hudi/pull/10062#issuecomment-1852496140 ## CI report: * c246208dbaf417f6db99c48ee3f5d54c52ef89e8 Azure:

Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852492519 Hi @ad1happy2go _Also did you tried with explicitly defining the Glue Sync Tool?_ Yes I had it in while running all my tests I have added both of these

Re: [I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]

2023-12-12 Thread via GitHub
mzheng-plaid commented on issue #9977: URL: https://github.com/apache/hudi/issues/9977#issuecomment-1852487662 Yes it does, I will validate on the full dataset but this seems to match exactly the symptoms we saw originally. I'm not following how

Re: [I] [SUPPORT] hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer [hudi]

2023-12-12 Thread via GitHub
ad1happy2go commented on issue #10320: URL: https://github.com/apache/hudi/issues/10320#issuecomment-1852441714 @young138120 Somehow the config you passed `spark.serializer` is not being set to spark conf although your code looks okay. can you try spark.conf.get("spark.serializer") before

Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub
ad1happy2go commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852436464 @MikeMccree Do you have this database in glue, If yes then your setup might not be accessing glue at all. You can use

Re: [I] [SUPPORT] - Issues after upgrading EMR & Hudi [hudi]

2023-12-12 Thread via GitHub
MikeMccree commented on issue #10273: URL: https://github.com/apache/hudi/issues/10273#issuecomment-1852429920 @ad1happy2go After more toying around I managed to get rid of the above exceptions by being specific about the JARS I am submitting along with my spark-submit. The problem now is

Re: [PR] [HUDI-6613] implement inmemory file index to allow for glob paths [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10062: URL: https://github.com/apache/hudi/pull/10062#issuecomment-1852428482 ## CI report: * c246208dbaf417f6db99c48ee3f5d54c52ef89e8 Azure:

Re: [I] [SUPPORT] How to skip some partitions in a table when readStreaming in Spark at the init stage [hudi]

2023-12-12 Thread via GitHub
ad1happy2go commented on issue #10315: URL: https://github.com/apache/hudi/issues/10315#issuecomment-1852418826 @lei-su-awx If the table is partitioned then it should only read the files under that partition. Are you seeing any behaviour otherwise if it is reading all files? -- This is

Re: [PR] [HUDI-7215] Delete NewHoodieParquetFileFormat [hudi]

2023-12-12 Thread via GitHub
hudi-bot commented on PR #10304: URL: https://github.com/apache/hudi/pull/10304#issuecomment-1852401372 ## CI report: * d858eaac14b3de45d4066165622738d91ff603fe Azure:

  1   2   >