Re: [I] [SUPPORT] Compaction error [hudi]

2023-11-03 Thread via GitHub
watermelon12138 commented on issue #9885: URL: https://github.com/apache/hudi/issues/9885#issuecomment-1793357812 @fearlsgroove we will meet this problem only if oldSchema and writeSchema have diff. The writeSchema comes from the input data. So, I suggest you to check the schema in delt

Re: [PR] [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on code in PR #9260: URL: https://github.com/apache/hudi/pull/9260#discussion_r1382340858 ## hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieConfigAWSAssumedRoleCredentialsProvider.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

Re: [I] [SUPPORT] Hudi Spark WEB UI [SQL / DataFrame] Details for Query, It does not display detailed indicator information [hudi]

2023-11-03 Thread via GitHub
watermelon12138 commented on issue #9944: URL: https://github.com/apache/hudi/issues/9944#issuecomment-1793351438 > but we still need to change SparkPlan -> DataSet @boneanxs Hi, great man, How to understand this step【 we still need to change SparkPlan -> DataSet】. Can

Re: [PR] [HUDI-7030] update containsInstant without containsOrBeforeTimelineStarts to fix data lost [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on code in PR #9982: URL: https://github.com/apache/hudi/pull/9982#discussion_r1382338597 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java: ## @@ -439,7 +439,7 @@ public boolean containsInstant(String ts) { //

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1793350618 @waitingF Please also update the website to add this new function doc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Updated] (HUDI-6382) support hudi table type changing in hudi-cli module

2023-11-03 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6382: - Fix Version/s: 1.0.0 > support hudi table type changing in hudi-cli module > -

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-03 Thread via GitHub
danny0405 merged PR #9937: URL: https://github.com/apache/hudi/pull/9937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[jira] [Closed] (HUDI-6382) support hudi table type changing in hudi-cli module

2023-11-03 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6382. Resolution: Fixed Fixed via master branch: 105f947c4debe4372b2d1d249e0642d86fafc4d9 > support hudi table ty

(hudi) branch master updated: [HUDI-6382] Support hoodie-table-type changing in hudi-cli (#9937)

2023-11-03 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 105f947c4de [HUDI-6382] Support hoodie-table-ty

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1793350351 Tests have passed: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20655&view=results -- This is an automated message from the Apache Git Service. To respo

[jira] [Closed] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-11-03 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6990. Resolution: Fixed Fixed via master branch: 47687763102d4df43609577c08ce6d83ea94d297 > Spark clustering job

[jira] [Updated] (HUDI-6990) Spark clustering job reads records support control the parallelism

2023-11-03 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6990: - Fix Version/s: 1.0.0 0.14.1 > Spark clustering job reads records support control the pa

Re: [I] [SUPPORT] Hudi Spark WEB UI [SQL / DataFrame] Details for Query, It does not display detailed indicator information [hudi]

2023-11-03 Thread via GitHub
watermelon12138 commented on issue #9944: URL: https://github.com/apache/hudi/issues/9944#issuecomment-1793350123 @chestnutqiang @boneanxs I have also been confused lately about how Spark obtains metrics such as inputBytes, inputRecords, and outputRecords when executing Hudi's insert comman

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-03 Thread via GitHub
danny0405 merged PR #9925: URL: https://github.com/apache/hudi/pull/9925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

(hudi) branch master updated: [HUDI-6990] Configurable clustering group read task parallelism (#9925)

2023-11-03 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 47687763102 [HUDI-6990] Configurable clustering

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1793349832 Tests have passed: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20654&view=results -- This is an automated message from the Apache Git Service. To respo

[jira] [Updated] (HUDI-7005) Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar

2023-11-03 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7005: - Fix Version/s: 1.0.0 0.14.1 > Flink SQL Queries on Hudi Table fail when using the hudi-

[jira] [Closed] (HUDI-7005) Flink SQL Queries on Hudi Table fail when using the hudi-aws-bundle jar

2023-11-03 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7005. Resolution: Fixed Fixed via master branch: b14f9e48d3d81cb765e5b2fb355eb2c1e24ee582 > Flink SQL Queries on

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
danny0405 merged PR #9946: URL: https://github.com/apache/hudi/pull/9946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

(hudi) branch master updated: [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro (#9946)

2023-11-03 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new b14f9e48d3d [HUDI-7005] Fix hudi-aws-bundle rel

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on code in PR #9946: URL: https://github.com/apache/hudi/pull/9946#discussion_r1382337801 ## packaging/hudi-flink-bundle/pom.xml: ## @@ -84,7 +84,6 @@ org.apache.hudi:hudi-sync-common org.apache.hudi:hudi-hadoop-mr

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on PR #9871: URL: https://github.com/apache/hudi/pull/9871#issuecomment-1793343188 Thanks for the contribution @codope , I have reviewed and created a patch: [2461.patch.zip](https://github.com/apache/hudi/files/13255858/2461.patch.zip) -- This is an automat

Re: [PR] [MINOR] Fix npe for get internal schema [hudi]

2023-11-03 Thread via GitHub
watermelon12138 commented on PR #9984: URL: https://github.com/apache/hudi/pull/9984#issuecomment-179108 @xiarixiaoyao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-03 Thread via GitHub
nsivabalan commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1382330934 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -545,33 +552,37 @@ class HoodieSparkSqlWriterInternal {

[PR] [MINOR] Fix npe for get internal schema [hudi]

2023-11-03 Thread via GitHub
watermelon12138 opened a new pull request, #9984: URL: https://github.com/apache/hudi/pull/9984 ### Change Logs get internal schema maybe meet npe when parse avroSchema. So, we need to return InternalSchema.getEmptyInternalSchema() when avroSchema is null or empty. ### Impact

Re: [I] [SUPPORT]: org.apache.hudi.exception.HoodieException: unable to read next record from parquet file [hudi]

2023-11-03 Thread via GitHub
watermelon12138 commented on issue #9918: URL: https://github.com/apache/hudi/issues/9918#issuecomment-1793317089 @Armelabdelkbir -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] [SUPPORT]: org.apache.hudi.exception.HoodieException: unable to read next record from parquet file [hudi]

2023-11-03 Thread via GitHub
watermelon12138 commented on issue #9918: URL: https://github.com/apache/hudi/issues/9918#issuecomment-1793316968 maybe compaction produced the broken parquet file when it failed for the first time and produced the normal parquet file when it retried successfully。There will be tow parquet f

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
darlatrade commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1793304296 Here is how "id" is derived. df.withColumn("id", concat("evnt_cent_tz",lit("_"),md5(concat("key_col1","key_col2","key_col3","evnt_cent_tz" Sample values from table:

(hudi) branch asf-site updated: [HUDI-7021] Add blog to introduce record level index (#9970)

2023-11-03 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 99e573a9f39 [HUDI-7021] Add blog to introduc

Re: [PR] [HUDI-7021] Add blog to introduce record level index [hudi]

2023-11-03 Thread via GitHub
xushiyan merged PR #9970: URL: https://github.com/apache/hudi/pull/9970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.

Re: [PR] [HUDI-0000] DO NOT MERGE Fix incr errors new reader [hudi]

2023-11-03 Thread via GitHub
jonvex commented on PR #9954: URL: https://github.com/apache/hudi/pull/9954#issuecomment-1793295105 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
nsivabalan commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1793285715 got it. may I know whats your record key comprises of. I mean, I see it as "id". but is it random id or does it refer to some timestmap based keys. If its timestamp based values,

Re: [PR] [HUDI-7021] Add blog to introduce record level index [hudi]

2023-11-03 Thread via GitHub
xushiyan commented on PR #9970: URL: https://github.com/apache/hudi/pull/9970#issuecomment-1793272055 ## Latest version ![screencapture-localhost-3000-blog-2023-11-01-record-level-index-2023-11-03-19_35_21](https://github.com/apache/hudi/assets/2701446/bc51e836-345f-4d26-8d50-50be672d

Re: [PR] [HUDI-7021] Add blog to introduce record level index [hudi]

2023-11-03 Thread via GitHub
nsivabalan commented on code in PR #9970: URL: https://github.com/apache/hudi/pull/9970#discussion_r1382298778 ## website/blog/2023-11-01-record-level-index.md: ## @@ -0,0 +1,236 @@ +--- +title: "Record Level Index: Hudi's blazing fast indexing for large-scale datasets" +excerp

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
darlatrade commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1793259099 @nsivabalan 1. Size of the table and no file objects in root folder of table. https://github.com/apache/hudi/assets/109939327/bdee4b13-f9d5-4edf-9e69-92d13a24fe79";>

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
nsivabalan commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1793247811 if my understanding of your pipeline/workload is wrong, lets sync up in hudi OSS workspace. we can see whats going on. -- This is an automated message from the Apache Git Serv

Re: [PR] [HUDI-3304] Allow selective partial update [hudi]

2023-11-03 Thread via GitHub
CTTY commented on code in PR #7359: URL: https://github.com/apache/hudi/pull/7359#discussion_r1382291540 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/commit/JavaWriteHelper.java: ## @@ -69,17 +70,26 @@ public List> deduplicateRecords( }).collec

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
nsivabalan commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1793247596 hey @darlatrade : can you help w/ some more info. 1. Whats the size of the table. 2. I assume its COW table. 3. based on your stats, looks like we have 60 file groups m

Re: [PR] [HUDI-3304] Allow selective partial update [hudi]

2023-11-03 Thread via GitHub
CTTY commented on code in PR #7359: URL: https://github.com/apache/hudi/pull/7359#discussion_r1382290995 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/commit/FlinkWriteHelper.java: ## @@ -99,24 +100,34 @@ public List> deduplicateRecords( // c

Re: [PR] [HUDI-0001] DO NOT MERGE Ci not working test [hudi]

2023-11-03 Thread via GitHub
jonvex closed pull request #9983: [HUDI-0001] DO NOT MERGE Ci not working test URL: https://github.com/apache/hudi/pull/9983 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[jira] [Updated] (HUDI-1) Design and Implement embedded timeline service to cache filesystem view to reduce listStatus calls

2023-11-03 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1: -- Labels: pull-request-available (was: ) > Design and Implement embedded timeline service to cache filesyste

[PR] [HUDI-0001] DO NOT MERGE Ci not working test [hudi]

2023-11-03 Thread via GitHub
jonvex opened a new pull request, #9983: URL: https://github.com/apache/hudi/pull/9983 ### Change Logs testassdfa ### Impact sdafdsaf ### Risk level (write none, low medium or high below) none asdfasd ### Documentation Update asfdfad

Re: [PR] [HUDI-7021] Adding blog introducing RLI w/ Hudi [hudi]

2023-11-03 Thread via GitHub
xushiyan commented on PR #9970: URL: https://github.com/apache/hudi/pull/9970#issuecomment-1793148456 ![screencapture-localhost-3000-blog-2023-11-01-record-level-index-2023-11-03-16_55_03](https://github.com/apache/hudi/assets/2701446/ee18f61a--4fad-95f6-86b08f1e3b47) -- This is a

Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-03 Thread via GitHub
rmahindra123 commented on code in PR #9913: URL: https://github.com/apache/hudi/pull/9913#discussion_r1382227013 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java: ## @@ -541,29 +579,37 @@ private Pair>> fetchFromSourc checkpointStr = data

Re: [PR] [HUDI-0000] DO NOT MERGE Fix incr errors new reader [hudi]

2023-11-03 Thread via GitHub
jonvex commented on PR #9954: URL: https://github.com/apache/hudi/pull/9954#issuecomment-1793069386 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
darlatrade commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1793007675 Thanks for quick reply @vinothchandar We completed most of the testing on 0.10.1. May not be able to upgrade soon. But at least I can try with 0.14 and test for this table if that

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
vinothchandar commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1792986756 @darlatrade Just to weed anything out. is it easy for you to try this table on 0.14 version in a test/staging environment? Do you have the Spark Stages UI screenshot. we can

Re: [PR] [HUDI-0000] DO NOT MERGE Fix incr errors new reader [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9954: URL: https://github.com/apache/hudi/pull/9954#issuecomment-1792954189 ## CI report: * 18ae9b1b08a6e3c7fd5ea4daf6935a10ed3e0fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2058

Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]

2023-11-03 Thread via GitHub
joeytman commented on issue #9971: URL: https://github.com/apache/hudi/issues/9971#issuecomment-1792943081 Makes sense, thanks for the quick help on this 🙇 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]

2023-11-03 Thread via GitHub
mzheng-plaid commented on issue #9977: URL: https://github.com/apache/hudi/issues/9977#issuecomment-1792906127 @ad1happy2go same issue with SIMPLE index -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
darlatrade commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1792891849 Commit file has 16745 lines. I have month level partitions and last commit touched almost 1 year (12) partitions. We are maintaining 3 years 36 partitions (12 per a year). Looks

Re: [I] [SUPPORT] Data loss in MOR table after clustering partition [hudi]

2023-11-03 Thread via GitHub
mzheng-plaid commented on issue #9977: URL: https://github.com/apache/hudi/issues/9977#issuecomment-1792820698 @ad1happy2go no failures in Spark UI I will check SIMPLE index instead of BLOOM -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
ad1happy2go commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1792819209 You can try to open this commit file and see how many file groups are being updated as part of this commit. How many partitions you have in your table? -- This is an automated mes

Re: [PR] [HUDI-3304] Add support for selective partial update [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9979: URL: https://github.com/apache/hudi/pull/9979#issuecomment-1792774679 ## CI report: * b9e26b3d425f88f0599283a0e834e4581a8b1b64 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2066

Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9955: URL: https://github.com/apache/hudi/pull/9955#issuecomment-1792749446 ## CI report: * 11a355c59b6c14ce8ba03cfbefcc5b6ab8ca422c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2066

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
darlatrade commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1792681479 Thanks for the reply Here is the stage detail. Not sure where to look at exact size. https://github.com/apache/hudi/assets/109939327/14db67f3-aebd-4ed4-b2f0-48ab5398171d";>

Re: [PR] [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role [hudi]

2023-11-03 Thread via GitHub
hussein-awala commented on code in PR #9260: URL: https://github.com/apache/hudi/pull/9260#discussion_r1381880071 ## hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieConfigAWSAssumedRoleCredentialsProvider.java: ## @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Soft

Re: [I] UPSERTs are taking time [hudi]

2023-11-03 Thread via GitHub
ad1happy2go commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1792587002 @darlatrade As I see it is taking time in the "Doing partition and writing data", it probably mean your incremental may be touching lot of file groups so it had to rewrite lot of pa

Re: [I] Incoming batch schema is not compatible with the table's one [hudi]

2023-11-03 Thread via GitHub
ad1happy2go commented on issue #9980: URL: https://github.com/apache/hudi/issues/9980#issuecomment-1792583323 @njalan It happens when the source schema is not backward compatible to Hudi table schema. Can you give us more insights what schema changes you are getting. -- This is an automat

(hudi) branch master updated: [HUDI-7002] Fixing initializing RLI MDT partition for non-partitioned dataset (#9938)

2023-11-03 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new cc95b03b5ef [HUDI-7002] Fixing initializing RLI

Re: [PR] [HUDI-7002] Fixing initializing RLI MDT partition for non-partitioned dataset [hudi]

2023-11-03 Thread via GitHub
nsivabalan merged PR #9938: URL: https://github.com/apache/hudi/pull/9938 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apach

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9871: URL: https://github.com/apache/hudi/pull/9871#issuecomment-1792464812 ## CI report: * a6b85794428dcd7a7b45f28430bfb5f6c42fc910 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2062

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9871: URL: https://github.com/apache/hudi/pull/9871#issuecomment-1792447778 ## CI report: * a6b85794428dcd7a7b45f28430bfb5f6c42fc910 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2062

Re: [PR] [HUDI-7030] update containsInstant without containsOrBeforeTimelineStarts to fix data lost [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9982: URL: https://github.com/apache/hudi/pull/9982#issuecomment-1792434092 ## CI report: * c7e24ccd6b3f9ab6821758ac3aeec77c2f47afe9 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2066

Re: [PR] [HUDI-7001] ComplexAvroKeyGenerator should represent single record key as the value string without composing the key field name [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9936: URL: https://github.com/apache/hudi/pull/9936#issuecomment-1792433704 ## CI report: * 92501c8473c95562c5158daebe08e3787282e6eb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2066

Re: [PR] [HUDI-6999] Adding row writer support to HoodieStreamer [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9913: URL: https://github.com/apache/hudi/pull/9913#issuecomment-1792433230 ## CI report: * caefe9891b1eda36c04dfe6003b071bb813db7d7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-11-03 Thread via GitHub
codope commented on code in PR #9871: URL: https://github.com/apache/hudi/pull/9871#discussion_r1381672056 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/io/TestHoodieTimelineArchiver.java: ## @@ -1221,35 +1222,6 @@ public void testArchiveTableWithMetadataTableC

Re: [PR] [HUDI-7030] update containsInstant without containsOrBeforeTimelineStarts to fix data lost [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9982: URL: https://github.com/apache/hudi/pull/9982#issuecomment-1792368407 ## CI report: * c7e24ccd6b3f9ab6821758ac3aeec77c2f47afe9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-7030) Log reader data lost as that not consistent behavior in timeline's containsInstant

2023-11-03 Thread ann (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ann updated HUDI-7030: -- Description: Log reader filtered all log data blocks which come from inflight instant. !image-2023-11-03-19-49-22-894.p

[jira] [Updated] (HUDI-7030) Log reader data lost as that not consistent behavior in timeline's containsInstant

2023-11-03 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7030: - Labels: pull-request-available (was: ) > Log reader data lost as that not consistent behavior in

[PR] [HUDI-7030] update containsInstant without containsOrBeforeTimelineStarts to fix data lost [hudi]

2023-11-03 Thread via GitHub
Xoln opened a new pull request, #9982: URL: https://github.com/apache/hudi/pull/9982 …data lost ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or a

[jira] [Created] (HUDI-7030) Log reader data lost as that not consistent behavior in timeline's containsInstant

2023-11-03 Thread ann (Jira)
ann created HUDI-7030: - Summary: Log reader data lost as that not consistent behavior in timeline's containsInstant Key: HUDI-7030 URL: https://issues.apache.org/jira/browse/HUDI-7030 Project: Apache Hudi

Re: [I] [SUPPORT] Trino queries failing when hudi.metadata_enabled is set to true. [hudi]

2023-11-03 Thread via GitHub
BalaMahesh commented on issue #9758: URL: https://github.com/apache/hudi/issues/9758#issuecomment-1792328313 hoodie.clean.async=false after setting this false compaction is being triggered for the metadata table, earlier always there are pendingInstants of delta commits because async

Re: [PR] [HUDI-7009] Filtering out null values from avro kafka source [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9955: URL: https://github.com/apache/hudi/pull/9955#issuecomment-1792286698 ## CI report: * 7a24b91b83fef2b8b2bf278a1fafd9d1bb2a7d03 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-6382] support hoodie-table-type changing in hudi-cli [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9937: URL: https://github.com/apache/hudi/pull/9937#issuecomment-1792275376 ## CI report: * 1d5de86d295233edff138e9bfb8e9151a5b7ecae Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-7014] Optimize the code of BoundedPartitionAwareCompactionStrategy [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9961: URL: https://github.com/apache/hudi/pull/9961#issuecomment-1792275664 ## CI report: * 75c27611b7a2ceacf43aa903f5b542fffc7b27cf Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=205

Re: [PR] [HUDI-7010] Build clustering group reduces redundant traversals [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9957: URL: https://github.com/apache/hudi/pull/9957#issuecomment-1792275601 ## CI report: * 0c34110238584b4ec7862d8849e5ca5353769051 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=205

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
PrabhuJoseph commented on code in PR #9946: URL: https://github.com/apache/hudi/pull/9946#discussion_r1381508234 ## packaging/hudi-flink-bundle/pom.xml: ## @@ -84,7 +84,6 @@ org.apache.hudi:hudi-sync-common org.apache.hudi:hudi-hadoop-mr

Re: [PR] [MINOR] fix checkpoint loss issue in deltastreamer after changing table type from mor to cow case [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9981: URL: https://github.com/apache/hudi/pull/9981#issuecomment-1792219072 ## CI report: * 1f6ab73fa18ddf7eaa3440f8c34bb79cac5c1835 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2066

Re: [PR] [HUDI-7010] Build clustering group reduces redundant traversals [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9957: URL: https://github.com/apache/hudi/pull/9957#issuecomment-1792218875 ## CI report: * 0c34110238584b4ec7862d8849e5ca5353769051 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=205

Re: [PR] [HUDI-7014] Optimize the code of BoundedPartitionAwareCompactionStrategy [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9961: URL: https://github.com/apache/hudi/pull/9961#issuecomment-1792218972 ## CI report: * 75c27611b7a2ceacf43aa903f5b542fffc7b27cf Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=205

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1792218520 ## CI report: * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1792203948 ## CI report: * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [MINOR] fix checkpoint loss issue in deltastreamer after changing table type from mor to cow case [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9981: URL: https://github.com/apache/hudi/pull/9981#issuecomment-1792204326 ## CI report: * 1f6ab73fa18ddf7eaa3440f8c34bb79cac5c1835 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-03 Thread via GitHub
ksmou commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1792202863 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [HUDI-6990] Configurable clustering task parallelism [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9925: URL: https://github.com/apache/hudi/pull/9925#issuecomment-1792188675 ## CI report: * abd9807817eb49458b1f8dd9f9d31157ba2b5a81 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9946: URL: https://github.com/apache/hudi/pull/9946#issuecomment-1792189127 ## CI report: * 5daa002dfd75ec233a9ad045ad0c32cfa673a933 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

(hudi) branch master updated: [MINOR] Re-enable a test that got fixed (#9978)

2023-11-03 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e4c217412fb [MINOR] Re-enable a test that got fixe

Re: [PR] [MINOR] Re-enable a test that got fixed [hudi]

2023-11-03 Thread via GitHub
codope merged PR #9978: URL: https://github.com/apache/hudi/pull/9978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.or

Re: [PR] [HUDI-7022] RunClusteringProcedure support limit parameter [hudi]

2023-11-03 Thread via GitHub
ksmou commented on PR #9975: URL: https://github.com/apache/hudi/pull/9975#issuecomment-1792155673 > @ksmou Can you update the website to reflect with the new parameter? okay. I update it later. -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] [MINOR] fix checkpoint loss issue in deltastreamer after changing table type from mor to cow case [hudi]

2023-11-03 Thread via GitHub
waitingF opened a new pull request, #9981: URL: https://github.com/apache/hudi/pull/9981 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ Before changing MOR to COW, we will do a full compaction. But in the .commit meta file gen

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9946: URL: https://github.com/apache/hudi/pull/9946#issuecomment-1792129144 ## CI report: * 5daa002dfd75ec233a9ad045ad0c32cfa673a933 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9923: URL: https://github.com/apache/hudi/pull/9923#issuecomment-1792128923 ## CI report: * ff11f10133f07427df3d13df8393362a75004807 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9923: URL: https://github.com/apache/hudi/pull/9923#issuecomment-1792116889 ## CI report: * ff11f10133f07427df3d13df8393362a75004807 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [MINOR] Re-enable a test that got fixed [hudi]

2023-11-03 Thread via GitHub
hudi-bot commented on PR #9978: URL: https://github.com/apache/hudi/pull/9978#issuecomment-1792117294 ## CI report: * 3b3a9f61789da9d0f6ac569e5c2a9b7c7be8961c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2065

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-03 Thread via GitHub
zhuanshenbsj1 commented on PR #9923: URL: https://github.com/apache/hudi/pull/9923#issuecomment-1792110933 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink [hudi]

2023-11-03 Thread via GitHub
danny0405 closed issue #9971: [SUPPORT] Simple Bucket Index - discrepancy between Spark and Flink URL: https://github.com/apache/hudi/issues/9971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [SUPPORT]flink-sql write hudi use TIMESTAMP, when hive query, it get time+8h question, use TIMESTAMP_LTZ, the hive schema is bigint but timestamp [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on issue #9864: URL: https://github.com/apache/hudi/issues/9864#issuecomment-1792084335 @xicm can you help with this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on code in PR #9946: URL: https://github.com/apache/hudi/pull/9946#discussion_r1381325717 ## packaging/hudi-flink-bundle/pom.xml: ## @@ -84,7 +84,6 @@ org.apache.hudi:hudi-sync-common org.apache.hudi:hudi-hadoop-mr

Re: [PR] [HUDI-7005] Fix hudi-aws-bundle relocation issue with avro [hudi]

2023-11-03 Thread via GitHub
danny0405 commented on code in PR #9946: URL: https://github.com/apache/hudi/pull/9946#discussion_r1381325717 ## packaging/hudi-flink-bundle/pom.xml: ## @@ -84,7 +84,6 @@ org.apache.hudi:hudi-sync-common org.apache.hudi:hudi-hadoop-mr

  1   2   >