Re: [I] Add in-code docs for Rust [hudi-rs]

2024-08-13 Thread via GitHub
KnightChess commented on issue #71: URL: https://github.com/apache/hudi-rs/issues/71#issuecomment-2287857121 does it meat add doc for every `Moudles`, `Sturct`, `Method` in code, like this? ```rust /// Table interface for engine integrate #[derive(Clone, Debug)] pub struct Tab

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287822287 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [I] [SUPPORT] Hudi table created with dataframe API becomes unwritable to INSERT queries due to config conflict [hudi]

2024-08-13 Thread via GitHub
xicm commented on issue #11772: URL: https://github.com/apache/hudi/issues/11772#issuecomment-2287813444 https://github.com/apache/hudi/blob/35c00daaf871a6c1b87d6a440832d60f9b26ee14/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala#L1

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1716256917 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/vectorized/ColumnarBatchUtils.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Soft

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287756290 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [I] [SUPPORT] using Flink to write to Hudi in upsert mode and syncing to Hive, querying the external table in Hive gives an error:Caused by: org.apache.hudi.org.apache.avro.AvroRuntimeException: D

2024-08-13 Thread via GitHub
qw2qw2 commented on issue #10779: URL: https://github.com/apache/hudi/issues/10779#issuecomment-228773 > Can you share the content of your parquet file? `spark.parquet("file_name").show()` or other tools. scala> val in = spark.read.parquet("/user/xxx/table1/TRANS_YEAR=2019/TRANS_

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287731997 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-8073] Add hosts to storage path info and use it if present [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11761: URL: https://github.com/apache/hudi/pull/11761#issuecomment-2287723710 ## CI report: * cbbb18db6c31a406624c48127adb4ea865b454ab Azure: [SUCCESS](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=33)

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287719371 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287714696 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1716215312 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala: ## @@ -229,7 +229,7 @@ class HoodieMergeOnReadSnapshotHado

[jira] [Updated] (HUDI-8077) Fix the incremental cleaning to base on completion time

2024-08-13 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-8077: - Description: Currently, the incremental cleaning will remember a marker instant of last retained in the c

[jira] [Created] (HUDI-8077) Fix the incremental cleaning to base on completion time

2024-08-13 Thread Danny Chen (Jira)
Danny Chen created HUDI-8077: Summary: Fix the incremental cleaning to base on completion time Key: HUDI-8077 URL: https://issues.apache.org/jira/browse/HUDI-8077 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-7947) [Umbrella] RFC-80 : Support column families for wide tables

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7947: Fix Version/s: 1.0.0 > [Umbrella] RFC-80 : Support column families for wide tables > ---

[jira] [Updated] (HUDI-7919) Make integration tests run on Spark 3.5

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7919: Status: In Progress (was: Open) > Make integration tests run on Spark 3.5 > ---

[jira] [Updated] (HUDI-7918) Remove support of Spark 2, 3.0, 3.1, and 3.2

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7918: Status: Patch Available (was: In Progress) > Remove support of Spark 2, 3.0, 3.1, and 3.2 > ---

[jira] [Updated] (HUDI-7918) Remove support of Spark 2, 3.0, 3.1, and 3.2

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7918: Reviewers: Ethan Guo > Remove support of Spark 2, 3.0, 3.1, and 3.2 > --

[jira] [Closed] (HUDI-8012) Update checkstyle.xml based on the new release

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-8012. --- Resolution: Fixed > Update checkstyle.xml based on the new release > -

Re: [PR] [HUDI-8073] Add hosts to storage path info and use it if present [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11761: URL: https://github.com/apache/hudi/pull/11761#issuecomment-2287665040 ## CI report: * b8add4578188716a4f9e43d3719d65ef9aa1e973 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=25

Re: [PR] [HUDI-8073] Add hosts to storage path info and use it if present [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11761: URL: https://github.com/apache/hudi/pull/11761#issuecomment-2287664057 ## CI report: * b8add4578188716a4f9e43d3719d65ef9aa1e973 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=25

[jira] [Updated] (HUDI-7982) [Umbrella] Issues found with 1.0.0-beta2 multi-modal indexing

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7982: Story Points: 16 > [Umbrella] Issues found with 1.0.0-beta2 multi-modal indexing > -

[jira] [Updated] (HUDI-8026) Test multiple indexes creation and updates together

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8026: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > Test multiple indexes creation and updates together > --

[jira] [Updated] (HUDI-7958) Create partition stats index for all columns when no columns specified

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7958: Story Points: 4 > Create partition stats index for all columns when no columns specified > -

[jira] [Updated] (HUDI-8025) Test all indexes with compaction and cleaning

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8025: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > Test all indexes with compaction and cleaning >

[jira] [Updated] (HUDI-8022) All positive index tests should validate data skipping

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8022: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > All positive index tests should validate data skipping > ---

[jira] [Updated] (HUDI-8024) Test index updates and rollback

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8024: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > Test index updates and rollback >

[jira] [Updated] (HUDI-7958) Create partition stats index for all columns when no columns specified

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7958: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > Create partition stats index for all columns when no columns specifi

[jira] [Updated] (HUDI-7982) [Umbrella] Issues found with 1.0.0-beta2 multi-modal indexing

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7982: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > [Umbrella] Issues found with 1.0.0-beta2 multi-modal indexing >

[jira] [Updated] (HUDI-7994) Support secondary index on nested fields

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7994: Fix Version/s: 1.1.0 > Support secondary index on nested fields > >

[jira] [Assigned] (HUDI-8076) RFC for backwards compatible writer mode in Hudi 1.0

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-8076: --- Assignee: Ethan Guo > RFC for backwards compatible writer mode in Hudi 1.0 >

[jira] [Created] (HUDI-8076) RFC for backwards compatible writer mode in Hudi 1.0

2024-08-13 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-8076: --- Summary: RFC for backwards compatible writer mode in Hudi 1.0 Key: HUDI-8076 URL: https://issues.apache.org/jira/browse/HUDI-8076 Project: Apache Hudi Issue Type: New

[jira] [Updated] (HUDI-8076) RFC for backwards compatible writer mode in Hudi 1.0

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8076: Epic Link: HUDI-7856 Story Points: 10 > RFC for backwards compatible writer mode in Hudi 1.0 > --

[jira] [Updated] (HUDI-8076) RFC for backwards compatible writer mode in Hudi 1.0

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8076: Fix Version/s: 1.0.0 > RFC for backwards compatible writer mode in Hudi 1.0 > --

[jira] [Updated] (HUDI-2955) Upgrade Hadoop to 3.3.x

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2955: Sprint: Hudi-Sprint-Feb-14, Hudi-Sprint-Mar-14, Hudi-Sprint-Mar-21, Hudi-Sprint-Mar-22, Hudi-Sprint-Apr-05,

[jira] [Updated] (HUDI-7928) Fix shared HFile reader in HoodieNativeAvroHFileReader

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7928: Sprint: 2024/06/17-30, Hudi 1.0 Sprint 2024/08/12-18 (was: 2024/06/17-30) > Fix shared HFile reader in Hood

[jira] [Updated] (HUDI-7823) Simplify dependency management on exclusions

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7823: Sprint: 2024/06/17-30, 2024/06/03-16, Hudi 1.0 Sprint 2024/08/12-18 (was: 2024/06/17-30, 2024/06/03-16) >

[jira] [Updated] (HUDI-2955) Upgrade Hadoop to 3.3.x

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-2955: Story Points: 10 (was: 5) > Upgrade Hadoop to 3.3.x > --- > > Key: HUDI

[jira] [Updated] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7695: Story Points: 0.5 (was: 1) > Add docs on Spark 3.5 and Scala 2.13 > >

[jira] [Updated] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7695: Story Points: 1 > Add docs on Spark 3.5 and Scala 2.13 > > >

[jira] [Updated] (HUDI-8012) Update checkstyle.xml based on the new release

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8012: Story Points: 4 > Update checkstyle.xml based on the new release > -

[jira] [Assigned] (HUDI-7964) Partitions not created correctly with SQL when multiple partitions specified out of order

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-7964: --- Assignee: Sagar Sumit > Partitions not created correctly with SQL when multiple partitions specified

[jira] [Updated] (HUDI-7920) Make Spark 3.5 the default build profile for Spark integration

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7920: Sprint: 2024/06/17-30, Hudi 1.0 Sprint 2024/08/12-18 (was: 2024/06/17-30) > Make Spark 3.5 the default buil

[jira] [Closed] (HUDI-7964) Partitions not created correctly with SQL when multiple partitions specified out of order

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7964. --- Resolution: Fixed > Partitions not created correctly with SQL when multiple partitions specified > out of ord

[jira] [Updated] (HUDI-7695) Add docs on Spark 3.5 and Scala 2.13

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7695: Sprint: Sprint 2024-04-26, 2024/06/17-30, 2024/06/03-16, Hudi 1.0 Sprint 2024/08/12-18 (was: Sprint 2024-04

[jira] [Closed] (HUDI-7978) Update docs for older versions to state that partitions should be ordered when creating multiple partitions

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo closed HUDI-7978. --- Resolution: Fixed > Update docs for older versions to state that partitions should be ordered > when creating

[jira] [Updated] (HUDI-8034) KeyGenUtils#inferKeyGeneratorTypeForAutoKeyGen should support custom key generator

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8034: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > KeyGenUtils#inferKeyGeneratorTypeForAutoKeyGen should support custom

[jira] [Updated] (HUDI-7902) Partition fields in Table config should store partition field types for custom key generator

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7902: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > Partition fields in Table config should store partition field types

[jira] [Updated] (HUDI-7916) Add tests on the integration of new file group reader with Hive

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7916: Sprint: 2024/06/17-30, Hudi 1.0 Sprint 2024/08/12-18 (was: 2024/06/17-30) > Add tests on the integration of

[jira] [Updated] (HUDI-8075) Revisit table service scheduling and execution with the completion time

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8075: Fix Version/s: 1.0.0 > Revisit table service scheduling and execution with the completion time > ---

[jira] [Updated] (HUDI-8075) Revisit table service scheduling and execution with the completion time

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8075: Story Points: 12 > Revisit table service scheduling and execution with the completion time > ---

[jira] [Assigned] (HUDI-8075) Revisit table service scheduling and execution with the completion time

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-8075: --- Assignee: Danny Chen > Revisit table service scheduling and execution with the completion time >

[jira] [Created] (HUDI-8075) Revisit table service scheduling and execution with the completion time

2024-08-13 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-8075: --- Summary: Revisit table service scheduling and execution with the completion time Key: HUDI-8075 URL: https://issues.apache.org/jira/browse/HUDI-8075 Project: Apache Hudi

[jira] [Updated] (HUDI-8033) RFC-81: Log Compaction with Merge Sort

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-8033: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > RFC-81: Log Compaction with Merge Sort > ---

[jira] [Updated] (HUDI-7930) Flink Support for Array of Row and Map of Row value

2024-08-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-7930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-7930: Sprint: Hudi 1.0 Sprint 2024/08/12-18 > Flink Support for Array of Row and Map of Row value > --

[jira] [Closed] (HUDI-8074) Improve comaction operator shuffle rebanlance

2024-08-13 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-8074. Resolution: Fixed Fixed via master branch: 35c00daaf871a6c1b87d6a440832d60f9b26ee14 > Improve comaction ope

[jira] [Created] (HUDI-8074) Improve comaction operator shuffle rebanlance

2024-08-13 Thread Danny Chen (Jira)
Danny Chen created HUDI-8074: Summary: Improve comaction operator shuffle rebanlance Key: HUDI-8074 URL: https://issues.apache.org/jira/browse/HUDI-8074 Project: Apache Hudi Issue Type: Improvem

(hudi) branch master updated: [HUDI-8074] Improve comaction operator shuffle rebanlance (#11757)

2024-08-13 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 35c00daaf87 [HUDI-8074] Improve comaction opera

Re: [PR] [MINOR] improve comaction operator shuffle [hudi]

2024-08-13 Thread via GitHub
danny0405 merged PR #11757: URL: https://github.com/apache/hudi/pull/11757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apac

Re: [PR] [HUDI-8073] Add hosts to storage path info and use it if present [hudi]

2024-08-13 Thread via GitHub
CTTY commented on code in PR #11761: URL: https://github.com/apache/hudi/pull/11761#discussion_r1716129382 ## hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HiveHoodieReaderContext.java: ## @@ -148,14 +146,27 @@ public HoodieStorage getStorage(String path, StorageConfigura

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287447599 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287355491 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

[jira] [Assigned] (HUDI-3204) Allow original partition column value to be retrieved when using TimestampBasedKeyGen

2024-08-13 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler reassigned HUDI-3204: - Assignee: Jonathan Vexler (was: Alexey Kudinkin) > Allow original partition column value

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287367900 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1716074438 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/vectorized/ColumnarBatchUtils.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Soft

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287359723 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1716017548 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/vectorized/ColumnarBatchUtils.java: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Soft

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287231776 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287212812 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

[I] [SUPPORT] Hudi table created with dataframe API becomes unwritable to INSERT queries due to config conflict [hudi]

2024-08-13 Thread via GitHub
CTTY opened a new issue, #11772: URL: https://github.com/apache/hudi/issues/11772 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...

Re: [PR] test: add integration test with minio [hudi-rs]

2024-08-13 Thread via GitHub
codecov[bot] commented on PR #112: URL: https://github.com/apache/hudi-rs/pull/112#issuecomment-2287193522 ## [Codecov](https://app.codecov.io/gh/apache/hudi-rs/pull/112?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287189388 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287187583 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287148289 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

[PR] test: add integration test with minio [hudi-rs]

2024-08-13 Thread via GitHub
abyssnlp opened a new pull request, #112: URL: https://github.com/apache/hudi-rs/pull/112 ## Description Adds integration tests with s3-compatible MinIO. - `docker/docker-compose.yaml` spins up the minio container - `docker/copy_tables.sh` creates the bucket; unzips, copies and c

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2287054359 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286965170 ## CI report: * 1c9c75fe46a149abe2490da22599f8a946a8e610 UNKNOWN * a912f9a615e44fa205bf5cb20544b382d530bd84 UNKNOWN * a6c39c6554eb1296077ae73b53f12f27d3085c38 UNKNOWN *

Re: [I] [SUPPORT] s3 list cost increases exponentially when using COW table [hudi]

2024-08-13 Thread via GitHub
ankit0811 commented on issue #11742: URL: https://github.com/apache/hudi/issues/11742#issuecomment-2286907919 Sure. Please find the schema (due to security reasons, I am not allowed to share the actual schema but this is the closest representation without giving much details) The wor

Re: [PR] [MINOR] Update Azure CI links in README [hudi]

2024-08-13 Thread via GitHub
yihua merged PR #11771: URL: https://github.com/apache/hudi/pull/11771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.o

(hudi) branch master updated: [MINOR] Update Azure CI links in README (#11771)

2024-08-13 Thread yihua
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 9db0a60e677 [MINOR] Update Azure CI links in README

Re: [PR] [MINOR] Update Azure CI links in README [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11771: URL: https://github.com/apache/hudi/pull/11771#issuecomment-2286839754 ## CI report: * 262cae05ab5c1abe17676b57f16b8a2b18158a87 Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=27)

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286837447 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25)

Re: [PR] [MINOR] Update Azure CI links in README [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11771: URL: https://github.com/apache/hudi/pull/11771#issuecomment-2286837512 ## CI report: * 262cae05ab5c1abe17676b57f16b8a2b18158a87 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run th

[PR] [MINOR] Update Azure CI links in README [hudi]

2024-08-13 Thread via GitHub
yihua opened a new pull request, #11771: URL: https://github.com/apache/hudi/pull/11771 ### Change Logs As above, after we migrated the Azure CI to the new organization and project in Azure DevOps (`apachehudi/hudi-oss-ci`). ### Impact Reflects the correct links.

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286835381 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25)

Re: [PR] [DNM][MINOR] Test Azure trigger again [hudi]

2024-08-13 Thread via GitHub
yihua closed pull request #11768: [DNM][MINOR] Test Azure trigger again URL: https://github.com/apache/hudi/pull/11768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715692436 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala: ## @@ -75,6 +75,9 @@ abstract class HoodieBaseHadoopFsRelat

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286786258 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25)

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286780587 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25)

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286750725 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [CANCELED](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25)

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286748514 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25) *

[jira] [Updated] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

2024-08-13 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-5807: -- Status: In Progress (was: Open) > HoodieSparkParquetReader is not appending partition-path valu

[jira] [Updated] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

2024-08-13 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-5807: -- Status: Patch Available (was: In Progress) > HoodieSparkParquetReader is not appending partitio

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715655134 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -251

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715652270 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -251

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715651297 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -127

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715649002 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -108

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715649002 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala: ## @@ -108

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
hudi-bot commented on PR #11770: URL: https://github.com/apache/hudi/pull/11770#issuecomment-2286730198 ## CI report: * 9afa794f3849bb09fe20accdf625a3974449716b Azure: [PENDING](https://dev.azure.com/apachehudi/a1a51da7-8592-47d4-88dc-fd67bed336bb/_build/results?buildId=25)

Re: [PR] [HUDI-5807] read partition values from file and create infra to support reading only a subset of columns [hudi]

2024-08-13 Thread via GitHub
jonvex commented on code in PR #11770: URL: https://github.com/apache/hudi/pull/11770#discussion_r1715647039 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala: ## @@ -168,9 +171,6 @@ abstract class HoodieBaseHadoopFsRel

[jira] [Updated] (HUDI-5807) HoodieSparkParquetReader is not appending partition-path values

2024-08-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5807: - Labels: hudi-1.0.0-beta2 pull-request-available (was: hudi-1.0.0-beta2) > HoodieSparkParquetReade

  1   2   >