Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765713334 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 7aa5e4061162b22c77dabfd9cfa85a70b5c1636b Azure:

Re: [I] [SUPPORT] Deep integration flink cdc? [hudi]

2023-10-16 Thread via GitHub
danny0405 commented on issue #9873: URL: https://github.com/apache/hudi/issues/9873#issuecomment-1765709080 We have no plan for that now, do you have intreast to contribute? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765706391 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 7aa5e4061162b22c77dabfd9cfa85a70b5c1636b Azure:

[jira] [Commented] (HUDI-6941) sparksql query perfermance cost more in hudi 0.14-rc

2023-10-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776015#comment-17776015 ] Danny Chen commented on HUDI-6941: -- Fixed via master branch: d7d321544644b9e599004beddd9a3c202bc05e7d >

[hudi] branch master updated: [HUDI-6941] Add unit test for HUDI-6941 for stages number check (#9866)

2023-10-16 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d7d32154464 [HUDI-6941] Add unit test for

Re: [PR] [HUDI-6941] Add unit test for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
danny0405 merged PR #9866: URL: https://github.com/apache/hudi/pull/9866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-6941] Add unit test for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9866: URL: https://github.com/apache/hudi/pull/9866#issuecomment-1765649524 ## CI report: * 5b7b0e7e19b4b81122128cc86d127646abb46e32 Azure:

Re: [PR] Row writer optimization for bulk insert [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9852: URL: https://github.com/apache/hudi/pull/9852#issuecomment-1765612416 ## CI report: * 00714535a8643da2ff4fadf9d4044f9db6671d2d Azure:

[jira] [Assigned] (HUDI-6924) Fix hoodie table config not wok in table properties

2023-10-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-6924: Assignee: Wechar (was: Danny Chen) > Fix hoodie table config not wok in table properties >

[jira] [Assigned] (HUDI-6924) Fix hoodie table config not wok in table properties

2023-10-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen reassigned HUDI-6924: Assignee: Danny Chen > Fix hoodie table config not wok in table properties >

[jira] [Updated] (HUDI-6924) Fix hoodie table config not wok in table properties

2023-10-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6924: - Fix Version/s: 1.0.0 0.14.1 > Fix hoodie table config not wok in table properties >

[jira] [Closed] (HUDI-6924) Fix hoodie table config not wok in table properties

2023-10-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6924. Resolution: Fixed Fixed via master branch: 7c79ebee1ff1c9a0f5117252cb12fa2f03ac4b24 > Fix hoodie table

[hudi] branch master updated: [HUDI-6924] Fix hoodie table config not wok in table properties (#9836)

2023-10-16 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7c79ebee1ff [HUDI-6924] Fix hoodie table

Re: [PR] [HUDI-6924] Fix hoodie table config not wok in table properties [hudi]

2023-10-16 Thread via GitHub
danny0405 merged PR #9836: URL: https://github.com/apache/hudi/pull/9836 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [HUDI-6924] Fix hoodie table config not wok in table properties [hudi]

2023-10-16 Thread via GitHub
wecharyu commented on PR #9836: URL: https://github.com/apache/hudi/pull/9836#issuecomment-1765583468 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[jira] [Closed] (HUDI-2141) Integration flink metric in flink stream

2023-10-16 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-2141. Resolution: Fixed Fixed via master branch: 566e22b0a3d7c3e67686122b732dbbc4ecdf0025 > Integration flink

[hudi] branch master updated: [HUDI-2141] Support flink stream write metrics (#9118)

2023-10-16 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 566e22b0a3d [HUDI-2141] Support flink stream

Re: [PR] [HUDI-2141] Support flink stream write metrics [hudi]

2023-10-16 Thread via GitHub
danny0405 merged PR #9118: URL: https://github.com/apache/hudi/pull/9118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] [SUPPORT] Is `hoodie.datasource.hive_sync.filter_pushdown_enabled` can enable default? [hudi]

2023-10-16 Thread via GitHub
KnightChess commented on issue #9784: URL: https://github.com/apache/hudi/issues/9784#issuecomment-1765572894 @yihua @boneanxs thanks for the advice. As @boneanxs say, `hoodie.datasource.hive_sync.filter_pushdown_max_size` can control use the range expression. And in our case, if list too

[jira] [Created] (HUDI-6949) Spark support non-blocking concurrency control

2023-10-16 Thread Jing Zhang (Jira)
Jing Zhang created HUDI-6949: Summary: Spark support non-blocking concurrency control Key: HUDI-6949 URL: https://issues.apache.org/jira/browse/HUDI-6949 Project: Apache Hudi Issue Type: New

Re: [PR] [HUDI-6941] Add unit test for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9866: URL: https://github.com/apache/hudi/pull/9866#issuecomment-1765561486 ## CI report: * 7beb83025fba43331259a7b85601d342a7cd4376 Azure:

Re: [PR] Row writer optimization for bulk insert [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9852: URL: https://github.com/apache/hudi/pull/9852#issuecomment-1765561450 ## CI report: * 09d2e02d34a1fb02596925aa6a0b1a04ec959d8e Azure:

Re: [PR] [HUDI-2141] Support flink stream write metrics [hudi]

2023-10-16 Thread via GitHub
stream2000 commented on code in PR #9118: URL: https://github.com/apache/hudi/pull/9118#discussion_r1361435492 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bulk/BulkInsertWriterHelper.java: ## @@ -69,14 +71,21 @@ public class BulkInsertWriterHelper {

Re: [PR] [HUDI-5832] add relocated prefix for hbase classes in hbase-site.xml [hudi]

2023-10-16 Thread via GitHub
yihua commented on PR #8029: URL: https://github.com/apache/hudi/pull/8029#issuecomment-1765536251 > > > Another question, what if some one just use hudi-common-*.jar? > > > > > > This is a good question, unfortunately we have no good way to solve it unless we publish our own

Re: [I] [SUPPORT] Is `hoodie.datasource.hive_sync.filter_pushdown_enabled` can enable default? [hudi]

2023-10-16 Thread via GitHub
boneanxs commented on issue #9784: URL: https://github.com/apache/hudi/issues/9784#issuecomment-1765534763 > If the list of changed partitions is too large, the filter pushdown may not be effective Yea, if the list of changed partitions is too large, then the expression built to match

Re: [PR] [HUDI-1623] Solid completion time on timeline [hudi]

2023-10-16 Thread via GitHub
boneanxs commented on code in PR #9617: URL: https://github.com/apache/hudi/pull/9617#discussion_r1361422542 ## hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java: ## @@ -35,19 +35,23 @@ public class LockConfiguration implements Serializable {

Re: [PR] [HUDI-6941] Add unit test for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9866: URL: https://github.com/apache/hudi/pull/9866#issuecomment-1765529218 ## CI report: * 7beb83025fba43331259a7b85601d342a7cd4376 Azure:

Re: [PR] Row writer optimization for bulk insert [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9852: URL: https://github.com/apache/hudi/pull/9852#issuecomment-1765529172 ## CI report: * 09d2e02d34a1fb02596925aa6a0b1a04ec959d8e Azure:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765528970 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 7aa5e4061162b22c77dabfd9cfa85a70b5c1636b Azure:

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-10-16 Thread via GitHub
danny0405 commented on code in PR #9871: URL: https://github.com/apache/hudi/pull/9871#discussion_r1361415820 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java: ## @@ -126,14 +127,18 @@ public Option getCompletionTime(String

Re: [PR] [HUDI-6495][RFC-66] Non-blocking Concurrency Control [hudi]

2023-10-16 Thread via GitHub
vinothchandar commented on code in PR #7907: URL: https://github.com/apache/hudi/pull/7907#discussion_r1361415450 ## rfc/rfc-66/rfc-66.md: ## @@ -0,0 +1,318 @@ +# RFC-66: Non-blocking Concurrency Control + +## Proposers +- @danny0405 +- @ForwardXu + +## Approvers +- + +##

Re: [PR] [HUDI-6495][RFC-66] Non-blocking Concurrency Control [hudi]

2023-10-16 Thread via GitHub
vinothchandar commented on code in PR #7907: URL: https://github.com/apache/hudi/pull/7907#discussion_r1361412028 ## rfc/rfc-66/rfc-66.md: ## @@ -0,0 +1,318 @@ +# RFC-66: Non-blocking Concurrency Control + +## Proposers +- @danny0405 +- @ForwardXu + +## Approvers +- + +##

Re: [I] [SUPPORT] java.lang.ClassCastException with incremental query [hudi]

2023-10-16 Thread via GitHub
yihua commented on issue #9172: URL: https://github.com/apache/hudi/issues/9172#issuecomment-1765517547 For existing Hudi releases on Spark 3.3.2, the mitigation is to set `spark.sql.parquet.enableVectorizedReader=false`. @rahil-c @CTTY I remember the vectorized reader is turned off in

Re: [PR] [HUDI-6941] Add ut for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
xuzifu666 commented on code in PR #9866: URL: https://github.com/apache/hudi/pull/9866#discussion_r1361413342 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -1968,6 +1968,110 @@ class TestInsertTable extends

Re: [PR] [HUDI-6941] Add ut for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
xuzifu666 commented on code in PR #9866: URL: https://github.com/apache/hudi/pull/9866#discussion_r1361413249 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -1968,6 +1968,110 @@ class TestInsertTable extends

Re: [PR] [HUDI-6941] Add ut for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
xuzifu666 commented on code in PR #9866: URL: https://github.com/apache/hudi/pull/9866#discussion_r1361412447 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -1968,6 +1968,110 @@ class TestInsertTable extends

Re: [PR] [HUDI-6724] - Defaulting previous Instant time to init time to enable full read of initial commit [hudi]

2023-10-16 Thread via GitHub
yihua commented on code in PR #9473: URL: https://github.com/apache/hudi/pull/9473#discussion_r1323686261 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/IncrSourceHelper.java: ## @@ -130,7 +130,7 @@ public static QueryInfo

Re: [I] [SUPPORT] project hudi-common: Compilation failure: Compilation failure [hudi]

2023-10-16 Thread via GitHub
danny0405 commented on issue #9744: URL: https://github.com/apache/hudi/issues/9744#issuecomment-1765504755 @ad1happy2go Do you have some thoughts here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] [SUPPORT] Is `hoodie.datasource.hive_sync.filter_pushdown_enabled` can enable default? [hudi]

2023-10-16 Thread via GitHub
yihua commented on issue #9784: URL: https://github.com/apache/hudi/issues/9784#issuecomment-1765490984 @KnightChess by default, incremental meta sync is enabled (`hoodie.meta.sync.incremental=true`), which means that the table partitions since the last time the table is synced are fetched

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765484025 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * a5aae57a0e10f6d304921f3c17bace8cc55f02eb Azure:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765478734 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 51c94f46f9f13ea8c1726daa988cf3f78892e5ec Azure:

Re: [PR] [HUDI-1623] Solid completion time on timeline [hudi]

2023-10-16 Thread via GitHub
yihua commented on code in PR #9617: URL: https://github.com/apache/hudi/pull/9617#discussion_r1361379762 ## hudi-common/src/main/java/org/apache/hudi/common/config/LockConfiguration.java: ## @@ -35,19 +35,23 @@ public class LockConfiguration implements Serializable { public

Re: [PR] [HUDI-6941] add ut for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
yihua commented on code in PR #9866: URL: https://github.com/apache/hudi/pull/9866#discussion_r1361376081 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestInsertTable.scala: ## @@ -1968,6 +1968,110 @@ class TestInsertTable extends

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765439491 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 51c94f46f9f13ea8c1726daa988cf3f78892e5ec Azure:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1765429810 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 51c94f46f9f13ea8c1726daa988cf3f78892e5ec Azure:

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9819: URL: https://github.com/apache/hudi/pull/9819#issuecomment-1765422820 ## CI report: * 278f01d15aab6f91354e418829b0c765afd32fcf Azure:

[jira] [Created] (HUDI-6948) HoodieAvroParquetReader sets configs wrong

2023-10-16 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-6948: - Summary: HoodieAvroParquetReader sets configs wrong Key: HUDI-6948 URL: https://issues.apache.org/jira/browse/HUDI-6948 Project: Apache Hudi Issue Type:

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9819: URL: https://github.com/apache/hudi/pull/9819#issuecomment-1765325092 ## CI report: * b74be62a4473e68b385c3125c5b513a0f102f53f Azure:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361295085 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -116,9 +116,26 @@ public static String getAvroRecordQualifiedName(String tableName) {

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361283440 ## hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java: ## @@ -661,6 +652,35 @@ private Pair>> fetchFromSourc return

Re: [I] Caused by: org.apache.avro.SchemaParseException: Cannot parse schema [hudi]

2023-10-16 Thread via GitHub
SamarthRaval closed issue #9358: Caused by: org.apache.avro.SchemaParseException: Cannot parse schema URL: https://github.com/apache/hudi/issues/9358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361267147 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -499,14 +500,16 @@ object HoodieSparkSqlWriter {

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361268093 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala: ## @@ -578,17 +582,25 @@ object HoodieSparkSqlWriter { }

[jira] [Created] (HUDI-6947) Clean up HoodieSparkSqlWriter.deduceWriterSchema

2023-10-16 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-6947: - Summary: Clean up HoodieSparkSqlWriter.deduceWriterSchema Key: HUDI-6947 URL: https://issues.apache.org/jira/browse/HUDI-6947 Project: Apache Hudi Issue

Re: [I] [SUPPORT] Hudi Job fails fast in concurrent write even with high retries and long wait time [hudi]

2023-10-16 Thread via GitHub
SamarthRaval commented on issue #9728: URL: https://github.com/apache/hudi/issues/9728#issuecomment-1765275004 > Thanks. We've tried the newest update from `DynamoDBBasedLockProvider` and `DynamoDbBasedLockConfig` but we are still seeing jobs fail pretty soon if encountering conflicting

Re: [I] [SUPPORT]java.util.ConcurrentModificationException: Cannot resolve conflicts for overlapping writes [hudi]

2023-10-16 Thread via GitHub
SamarthRaval commented on issue #7653: URL: https://github.com/apache/hudi/issues/7653#issuecomment-1765269597 I heard that DynamoDB lock provider doesn't work with retries, but zookeeper does ? If anyone has knowledge about this, would mind sharing here ? -- This is an automated

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361260400 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSchemaUtils.scala: ## @@ -0,0 +1,51 @@ +/* + * Licensed to the Apache Software Foundation

Re: [I] [SUPPORT]java.util.ConcurrentModificationException: Cannot resolve conflicts for overlapping writes [hudi]

2023-10-16 Thread via GitHub
SamarthRaval commented on issue #7653: URL: https://github.com/apache/hudi/issues/7653#issuecomment-1765267904 @Jason-liujc Can we just increase yarn.resourcemanager.am.max-attempts to higher number? so we can retry to run hudi again if somehow it fails on

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361254056 ## hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieAvroParquetReader.java: ## @@ -165,7 +165,10 @@ private ClosableIterator

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361251777 ## hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/AvroSchemaEvolutionUtils.java: ## @@ -111,17 +111,21 @@ public static InternalSchema

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361251001 ## hudi-common/src/main/java/org/apache/hudi/internal/schema/convert/AvroInternalSchemaConverter.java: ## @@ -68,6 +68,17 @@ public static Schema convert(InternalSchema

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9819: URL: https://github.com/apache/hudi/pull/9819#issuecomment-1765254005 ## CI report: * fc7f7a902c900daa5ad3244df64813f5b47ff07e Azure:

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9819: URL: https://github.com/apache/hudi/pull/9819#issuecomment-1765239624 ## CI report: * fc7f7a902c900daa5ad3244df64813f5b47ff07e Azure:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361222515 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java: ## @@ -206,6 +213,9 @@ public IndexedRecord next() { IndexedRecord

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361213712 ## hudi-common/src/main/java/org/apache/hudi/common/config/HoodieCommonConfig.java: ## @@ -71,6 +71,14 @@ public class HoodieCommonConfig extends HoodieConfig {

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1361211234 ## hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java: ## @@ -116,9 +116,24 @@ public static String getAvroRecordQualifiedName(String tableName) {

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9819: URL: https://github.com/apache/hudi/pull/9819#issuecomment-1765085018 ## CI report: * fc7f7a902c900daa5ad3244df64813f5b47ff07e Azure:

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9819: URL: https://github.com/apache/hudi/pull/9819#issuecomment-1765072178 ## CI report: * fc7f7a902c900daa5ad3244df64813f5b47ff07e Azure:

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9871: URL: https://github.com/apache/hudi/pull/9871#issuecomment-1765058691 ## CI report: * f4087ad6ee9f83d314874de0703eb2232abfcd7e Azure:

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
linliu-code commented on code in PR #9819: URL: https://github.com/apache/hudi/pull/9819#discussion_r1361090448 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -73,14 +71,13 @@ public final class HoodieFileGroupReader implements

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1764968732 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 51c94f46f9f13ea8c1726daa988cf3f78892e5ec Azure:

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ssandona commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764962815 Great. If you query the column_stats table as you mentioned do you see statistics properly computed? I am currently on 0.13.1 so cannot test that -- This is an automated message

[jira] [Updated] (HUDI-6946) Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata

2023-10-16 Thread Aditya Goenka (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka updated HUDI-6946: Description: Github Issue -  [https://github.com/apache/hudi/issues/9870]   Code to Reproduce - 

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ad1happy2go commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764889217 Simplified version of Reproducible Code - ``` COW_TABLE_NAME="table_duplicates" PARTITION_FIELD = "year,month" PRECOMBINE_FIELD = "timestamp"

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1764886605 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * c81e51eff4c688f4bfe63dc10bdbee4f0e340978 Azure:

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9871: URL: https://github.com/apache/hudi/pull/9871#issuecomment-1764887056 ## CI report: * f4087ad6ee9f83d314874de0703eb2232abfcd7e Azure:

Re: [PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9871: URL: https://github.com/apache/hudi/pull/9871#issuecomment-1764872147 ## CI report: * f4087ad6ee9f83d314874de0703eb2232abfcd7e UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1764871754 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 8b312ef45b84d8ed08dde719e60a01e4b7c44cb6 Azure:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1764857409 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 8b312ef45b84d8ed08dde719e60a01e4b7c44cb6 Azure:

[jira] [Updated] (HUDI-5210) End-to-end PoC of functional indexes

2023-10-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5210: - Labels: pull-request-available (was: ) > End-to-end PoC of functional indexes >

[PR] [HUDI-5210][WIP] Implement functional indexes [hudi]

2023-10-16 Thread via GitHub
codope opened a new pull request, #9872: URL: https://github.com/apache/hudi/pull/9872 ### Change Logs Support functional indexes. ### Impact Users can now create indexes on not jsut columns but also functions based on columns, e.g. `DATE_FORMAT(ts, '%Y-%m-%d')`

[jira] [Updated] (HUDI-2461) Support lock free multi-writer for metadata table

2023-10-16 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2461: - Labels: pull-request-available (was: ) > Support lock free multi-writer for metadata table >

[PR] [HUDI-2461] Support out of order commits in MDT with completion time view [hudi]

2023-10-16 Thread via GitHub
codope opened a new pull request, #9871: URL: https://github.com/apache/hudi/pull/9871 ### Change Logs Metadata table (MDT) has special handling for compaction. This PR ensures MDT compaction is handled in completion time based filesystem view. Previously, out-of-rder commit tests

[jira] [Updated] (HUDI-6946) Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata

2023-10-16 Thread Aditya Goenka (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Goenka updated HUDI-6946: Description: Github Issue -  [https://github.com/apache/hudi/issues/9870]   Code to Reproduce - 

[jira] [Updated] (HUDI-2461) Support lock free multi-writer for metadata table

2023-10-16 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-2461: -- Status: In Progress (was: Open) > Support lock free multi-writer for metadata table >

Re: [I] [SUPPORT] Duplicates upserting into large partitioned table with bloom index metadata enabled [hudi]

2023-10-16 Thread via GitHub
ad1happy2go commented on issue #9271: URL: https://github.com/apache/hudi/issues/9271#issuecomment-1764814154 Thanks @jspaine . We were able to reproduce this issue as part of https://github.com/apache/hudi/issues/9870 We will be working on a fix here. Tracking JIRA -

Re: [I] [SUPPORT] Duplicates upserting into large partitioned table with bloom index metadata enabled [hudi]

2023-10-16 Thread via GitHub
codope closed issue #9271: [SUPPORT] Duplicates upserting into large partitioned table with bloom index metadata enabled URL: https://github.com/apache/hudi/issues/9271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ad1happy2go commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764808778 Was able to reproduce with even smaller dataset involving boundaries. Created JIRA - https://issues.apache.org/jira/browse/HUDI-6946 -- This is an automated message from the

[jira] [Created] (HUDI-6946) Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata

2023-10-16 Thread Aditya Goenka (Jira)
Aditya Goenka created HUDI-6946: --- Summary: Data Duplicates with range pruning while using hoodie.bloom.index.use.metadata Key: HUDI-6946 URL: https://issues.apache.org/jira/browse/HUDI-6946 Project:

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1764782350 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * 8b312ef45b84d8ed08dde719e60a01e4b7c44cb6 Azure:

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ad1happy2go commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764782255 Perfect. Thanks @ssandona for the effort. I was able to reproduce with both the versions i.e. 0.13.1 and 0.14.0. Analysing more on the code. -- This is an

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-10-16 Thread via GitHub
jonvex commented on code in PR #9743: URL: https://github.com/apache/hudi/pull/9743#discussion_r1360860550 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/block/HoodieAvroDataBlock.java: ## @@ -206,6 +213,9 @@ public IndexedRecord next() { IndexedRecord

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ssandona commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764673796 @ad1happy2go I was able to create a replicable example. This with Hudi 0.13.1 creates duplicates. Here the code: ``` from pyspark.sql.functions import col, concat, lit, max,

Re: [PR] [HUDI-6941] add ut for HUDI-6941 for stages number check [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9866: URL: https://github.com/apache/hudi/pull/9866#issuecomment-1764605000 ## CI report: * 7beb83025fba43331259a7b85601d342a7cd4376 Azure:

Re: [PR] [HUDI-2141] Support flink stream write metrics [hudi]

2023-10-16 Thread via GitHub
hudi-bot commented on PR #9118: URL: https://github.com/apache/hudi/pull/9118#issuecomment-1764497532 ## CI report: * f6d7dd97c73898206da91b17144326a7dbbffae8 UNKNOWN * c62db1fdf94ee2c1f9b9e539f7a4b1bb866beb7e UNKNOWN * 6ee9b9fcbbf2ed709f2a5c12829ce43dee92f0e2 Azure:

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ad1happy2go commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764495406 @ssandona Thanks. I will try out that. Also if you have the reproducible setup, can you try with 0.14.0 once to make sure. You can read column statistics by reading the

Re: [PR] [HUDI-6786] HoodieFileGroupReader integration [hudi]

2023-10-16 Thread via GitHub
codope commented on code in PR #9819: URL: https://github.com/apache/hudi/pull/9819#discussion_r1360640228 ## hudi-common/src/main/java/org/apache/hudi/common/table/read/HoodieFileGroupReader.java: ## @@ -73,14 +71,13 @@ public final class HoodieFileGroupReader implements

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ssandona commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764416380 Hi, for **OptionC** I did not specify any value for `hoodie.metadata.index.column.stats.column.list` so according to the [Hudi doc](https://hudi.apache.org/docs/0.13.1/configurations)

Re: [I] [SUPPORT] Upsert operations end up with duplicate data. Range Pruning not working properly with column statistics [hudi]

2023-10-16 Thread via GitHub
ad1happy2go commented on issue #9870: URL: https://github.com/apache/hudi/issues/9870#issuecomment-1764370678 @ssandona Thanks for raising this. Any reason why you have not added partition column in the column list? I am not able to reproduce the error on my end. Can you just see the code

Re: [PR] [HUDI-6495][RFC-66] Non-blocking Concurrency Control [hudi]

2023-10-16 Thread via GitHub
beyond1920 commented on code in PR #7907: URL: https://github.com/apache/hudi/pull/7907#discussion_r1360569304 ## rfc/rfc-66/rfc-66.md: ## @@ -0,0 +1,318 @@ +# RFC-66: Non-blocking Concurrency Control + +## Proposers +- @danny0405 +- @ForwardXu + +## Approvers +- + +## Status +

  1   2   >