[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7413: [HUDI-5321] Fix inconsistencies in arePartitionRecordsSorted and try to limit lots of small files during bulk insert

2022-12-20 Thread GitBox
alexeykudinkin commented on code in PR #7413: URL: https://github.com/apache/hudi/pull/7413#discussion_r1053766388 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/execution/bulkinsert/RDDCustomColumnsSortPartitioner.java: ## @@ -43,14 +43,14 @@ public RDDCust

[jira] [Updated] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4586: -- Sprint: 2022/08/08, 2022/08/22, 2022/09/05, 0.13.0 Final Sprint (was: 2022/08/08, 2022/08/22, 2

[jira] [Updated] (HUDI-3777) Optimize column stats storage

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3777: -- Fix Version/s: (was: 0.13.0) > Optimize column stats storage > -

[jira] [Updated] (HUDI-3777) Optimize column stats storage

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3777: -- Priority: Critical (was: Blocker) > Optimize column stats storage > ---

[jira] [Updated] (HUDI-4076) Optimize metadata payload conversion before writing to metadata table

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4076: -- Priority: Major (was: Blocker) > Optimize metadata payload conversion before writing to metadat

[jira] [Updated] (HUDI-4033) Aggregated cols stats at partition level in col stats partition in MDT

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4033: -- Fix Version/s: (was: 0.13.0) > Aggregated cols stats at partition level in col stats partiti

[jira] [Updated] (HUDI-4076) Optimize metadata payload conversion before writing to metadata table

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4076: -- Fix Version/s: (was: 0.13.0) > Optimize metadata payload conversion before writing to metada

[jira] [Updated] (HUDI-3166) Implement new HoodieIndex based on metadata indices

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3166: -- Priority: Critical (was: Blocker) > Implement new HoodieIndex based on metadata indices >

[jira] [Closed] (HUDI-4035) Improve point lookup in Metadata Table

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-4035. - Resolution: Duplicate > Improve point lookup in Metadata Table > -

[jira] [Updated] (HUDI-3166) Implement new HoodieIndex based on metadata indices

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3166: -- Fix Version/s: (was: 0.13.0) > Implement new HoodieIndex based on metadata indices > --

[jira] [Updated] (HUDI-5364) Make sure Hudi's Column Stats are wired into Spark's relation stats

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5364: -- Sprint: 0.13.0 Final Sprint > Make sure Hudi's Column Stats are wired into Spark's relation stat

[jira] [Updated] (HUDI-3794) Rebase HoodieBackedTableMetadata API to return HoodieData

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3794: -- Fix Version/s: (was: 0.13.0) > Rebase HoodieBackedTableMetadata API to return HoodieData > -

[jira] [Updated] (HUDI-3794) Rebase HoodieBackedTableMetadata API to return HoodieData

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3794: -- Priority: Critical (was: Blocker) > Rebase HoodieBackedTableMetadata API to return HoodieData >

[jira] [Updated] (HUDI-3715) Make sure Hoodie Max File Configs are respected for all Formats

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3715: -- Fix Version/s: (was: 0.13.0) > Make sure Hoodie Max File Configs are respected for all Forma

[jira] [Updated] (HUDI-3715) Make sure Hoodie Max File Configs are respected for all Formats

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3715: -- Priority: Critical (was: Blocker) > Make sure Hoodie Max File Configs are respected for all For

[GitHub] [hudi] hemanth-gowda-12 commented on pull request #7474: [HUDI-5246] Added validation for Partition Path to not begin with "/"

2022-12-20 Thread GitBox
hemanth-gowda-12 commented on PR #7474: URL: https://github.com/apache/hudi/pull/7474#issuecomment-1360359747 @nsivabalan thanks for looking, but looking at this again only from a Java client usage perspective, independent from Spark, Flink or Delta Streamer perspective, would the validati

[GitHub] [hudi] nsivabalan commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry

2022-12-20 Thread GitBox
nsivabalan commented on PR #7517: URL: https://github.com/apache/hudi/pull/7517#issuecomment-1360405299 Let's ignore MDT for now. I have some basic doubt on MOR table inner workings. So, how does extraneous log files are ignored while reading a committed data from DT? ie. let's s

[GitHub] [hudi] nsivabalan commented on pull request #7517: [HUDI-5420] Fix metadata table validator to exclude uncommitted log files due to retry

2022-12-20 Thread GitBox
nsivabalan commented on PR #7517: URL: https://github.com/apache/hudi/pull/7517#issuecomment-1360405548 Patch looks good from MDT validation standpoint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [hudi] yihua commented on issue #7507: [SUPPORT] how to use flink offline with occ

2022-12-20 Thread GitBox
yihua commented on issue #7507: URL: https://github.com/apache/hudi/issues/7507#issuecomment-1360554125 @danny0405 @yuzhaojing could any of you help the user on the Flink related issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[jira] [Assigned] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5434: --- Assignee: Ethan Guo > Fix archival in MDT to not rely on rollbacks/clean in DT >

[jira] [Created] (HUDI-5441) different buckets for different partitions

2022-12-20 Thread loukey_j (Jira)
loukey_j created HUDI-5441: -- Summary: different buckets for different partitions Key: HUDI-5441 URL: https://issues.apache.org/jira/browse/HUDI-5441 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] leesf commented on a diff in pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
leesf commented on code in PR #4966: URL: https://github.com/apache/hudi/pull/4966#discussion_r1053917616 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/cluster/ClusteringPlanPartitionFilter.java: ## @@ -31,6 +34,11 @@ * NONE: skip filter * RE

[GitHub] [hudi] leesf commented on a diff in pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
leesf commented on code in PR #4966: URL: https://github.com/apache/hudi/pull/4966#discussion_r1053918382 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestSparkClusteringPlanPartitionFilter.java: ## @@ -69,10 +70,10 @@ public void

[GitHub] [hudi] leesf commented on a diff in pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
leesf commented on code in PR #4966: URL: https://github.com/apache/hudi/pull/4966#discussion_r1053918304 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestSparkClusteringPlanPartitionFilter.java: ## @@ -46,15 +47,15 @@ public class

[jira] [Updated] (HUDI-5364) Make sure Hudi's Column Stats are wired into Spark's relation stats

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5364: -- Priority: Critical (was: Blocker) > Make sure Hudi's Column Stats are wired into Spark's relati

[GitHub] [hudi] minihippo commented on a diff in pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-12-20 Thread GitBox
minihippo commented on code in PR #5064: URL: https://github.com/apache/hudi/pull/5064#discussion_r1038000435 ## hudi-platform-service/hudi-metaserver/src/main/java/org/apache/hudi/metaserver/client/HoodieMetaserverClient.java: ## @@ -0,0 +1,54 @@ +/* + * Licensed to the Apache

[jira] [Updated] (HUDI-4688) Decouple lazy cleaning of failed writes from clean action in multi-writer

2022-12-20 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-4688: -- Story Points: 8 (was: 10) > Decouple lazy cleaning of failed writes from clean action in multi-writer >

[jira] [Updated] (HUDI-5364) Make sure Hudi's Column Stats are wired into Spark's relation stats

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5364: -- Sprint: (was: 0.13.0 Final Sprint) > Make sure Hudi's Column Stats are wired into Spark's rela

[jira] [Updated] (HUDI-5364) Make sure Hudi's Column Stats are wired into Spark's relation stats

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5364: -- Sprint: 2023-01-09 > Make sure Hudi's Column Stats are wired into Spark's relation stats > -

[GitHub] [hudi] zhangyue19921010 commented on pull request #7519: [HUDI-5422] Control KEPP_LATEST_VERSIONS clean replaced files immediately or delete after a while

2022-12-20 Thread GitBox
zhangyue19921010 commented on PR #7519: URL: https://github.com/apache/hudi/pull/7519#issuecomment-1360738251 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] leesf commented on a diff in pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
leesf commented on code in PR #4966: URL: https://github.com/apache/hudi/pull/4966#discussion_r1053921116 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestSparkClusteringPlanPartitionFilter.java: ## @@ -104,4 +105,20 @@ public void

[GitHub] [hudi] hechao-ustc commented on a diff in pull request #7499: [HUDI-5413] Add record count payload to support pv/uv

2022-12-20 Thread GitBox
hechao-ustc commented on code in PR #7499: URL: https://github.com/apache/hudi/pull/7499#discussion_r1053921310 ## hudi-common/src/main/java/org/apache/hudi/common/model/RecordCountAvroPayload.java: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [hudi] leesf commented on a diff in pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
leesf commented on code in PR #4966: URL: https://github.com/apache/hudi/pull/4966#discussion_r1053921440 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestSparkClusteringPlanPartitionFilter.java: ## @@ -104,4 +105,20 @@ public void

[GitHub] [hudi] hudi-bot commented on pull request #5064: [HUDI-3654] Add new module `hudi-metaserver`

2022-12-20 Thread GitBox
hudi-bot commented on PR #5064: URL: https://github.com/apache/hudi/pull/5064#issuecomment-1360756998 ## CI report: * 53aa21bf23d2f8b0404743e6d016cfb2fac444f7 UNKNOWN * 07a3ea3956e5ce02a33a55eae4a0339796275f9d UNKNOWN * 810af96ee856bd94cfc82b01b67765a735f29c44 UNKNOWN * 01

[GitHub] [hudi] hechao-ustc closed pull request #7499: [HUDI-5413] Add record count payload to support pv/uv

2022-12-20 Thread GitBox
hechao-ustc closed pull request #7499: [HUDI-5413] Add record count payload to support pv/uv URL: https://github.com/apache/hudi/pull/7499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [hudi] XuQianJin-Stars commented on a diff in pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
XuQianJin-Stars commented on code in PR #4966: URL: https://github.com/apache/hudi/pull/4966#discussion_r1053927752 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/cluster/strategy/TestSparkClusteringPlanPartitionFilter.java: ## @@ -104,4 +105,20 @@ p

[GitHub] [hudi] hechao-ustc commented on a diff in pull request #7499: [HUDI-5413] Add record count payload to support pv/uv

2022-12-20 Thread GitBox
hechao-ustc commented on code in PR #7499: URL: https://github.com/apache/hudi/pull/7499#discussion_r1053921310 ## hudi-common/src/main/java/org/apache/hudi/common/model/RecordCountAvroPayload.java: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [hudi] loukey-lj opened a new pull request, #7525: [HUDI-5441] different buckets for different partitions

2022-12-20 Thread GitBox
loukey-lj opened a new pull request, #7525: URL: https://github.com/apache/hudi/pull/7525 ### Change Logs In the current bucket mechanism, the number of buckets in each partition is the same. If the partition data is skewed, a larger bucket will be allocated to the smaller partition

[jira] [Updated] (HUDI-5441) different buckets for different partitions

2022-12-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5441: - Labels: pull-request-available (was: ) > different buckets for different partitions > ---

[GitHub] [hudi] nfarah86 commented on a diff in pull request #7516: New blog 12 19

2022-12-20 Thread GitBox
nfarah86 commented on code in PR #7516: URL: https://github.com/apache/hudi/pull/7516#discussion_r1053930695 ## website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md: ## @@ -0,0 +1,49 @@ +--- +title: "Build Your First Hudi Lakehouse with AWS S3 and

[GitHub] [hudi] voonhous commented on a diff in pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-20 Thread GitBox
voonhous commented on code in PR #7480: URL: https://github.com/apache/hudi/pull/7480#discussion_r1053930850 ## hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32PlusHoodieParquetFileFormat.scala: ## @@ -228,7

[GitHub] [hudi] voonhous commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-20 Thread GitBox
voonhous commented on PR #7480: URL: https://github.com/apache/hudi/pull/7480#issuecomment-1360786269 > > @voonhous Maybe we need a parameter to control this feature, not all tables need to follow this logic > > Hmmm, CMIIW, Hudi has been relying on ASR for schema resolution since `h

[GitHub] [hudi] hudi-bot commented on pull request #7525: [HUDI-5441] different buckets for different partitions

2022-12-20 Thread GitBox
hudi-bot commented on PR #7525: URL: https://github.com/apache/hudi/pull/7525#issuecomment-1360786595 ## CI report: * e3afe78a10ac81fc72c7ff8bc04507dbf516b324 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] leesf commented on a diff in pull request #7499: [HUDI-5413] Add record count payload to support pv/uv

2022-12-20 Thread GitBox
leesf commented on code in PR #7499: URL: https://github.com/apache/hudi/pull/7499#discussion_r1053932648 ## hudi-common/src/main/java/org/apache/hudi/common/model/RecordCountAvroPayload.java: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under on

[GitHub] [hudi] nfarah86 commented on pull request #7516: New blog 12 19

2022-12-20 Thread GitBox
nfarah86 commented on PR #7516: URL: https://github.com/apache/hudi/pull/7516#issuecomment-1360788460 cc @bhasudha updated author. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [hudi] hechao-ustc commented on a diff in pull request #7499: [HUDI-5413] Add record count payload to support pv/uv

2022-12-20 Thread GitBox
hechao-ustc commented on code in PR #7499: URL: https://github.com/apache/hudi/pull/7499#discussion_r1053933284 ## hudi-common/src/main/java/org/apache/hudi/common/model/RecordCountAvroPayload.java: ## @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [hudi] voonhous commented on a diff in pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-20 Thread GitBox
voonhous commented on code in PR #7480: URL: https://github.com/apache/hudi/pull/7480#discussion_r1053934368 ## hudi-spark-datasource/hudi-spark3.2plus-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/Spark32PlusHoodieParquetFileFormat.scala: ## @@ -228,7

[jira] [Updated] (HUDI-5420) Fix metadata table validator to exclude uncommitted log files in successful deltacommits

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5420: Status: Patch Available (was: In Progress) > Fix metadata table validator to exclude uncommitted log files

[jira] [Updated] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5434: Status: In Progress (was: Open) > Fix archival in MDT to not rely on rollbacks/clean in DT > --

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT is not rolledback in all cases

2022-12-20 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5408: -- Epic Link: HUDI-1292 Story Points: 2 > Partially failed commits in MDT is not rol

[jira] [Updated] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-4586: Story Points: 6 (was: 3) > Address S3 timeouts in Bloom Index with metadata table > ---

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5323: Story Points: 4 (was: 3) > Decouple virtual key with writing bloom filters to parquet files > -

[jira] [Updated] (HUDI-2608) Support JSON schema in schema registry provider

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2608: -- Status: In Progress (was: Open) > Support JSON schema in schema registry provider > ---

[GitHub] [hudi] Zouxxyy commented on pull request #7481: [DOCS] Fix dataframe write option in schema_evolution docs

2022-12-20 Thread GitBox
Zouxxyy commented on PR #7481: URL: https://github.com/apache/hudi/pull/7481#issuecomment-1360800718 @nsivabalan Can you help with a review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [hudi] voonhous commented on pull request #5830: [HUDI-3981] Flink engine support for comprehensive schema evolution

2022-12-20 Thread GitBox
voonhous commented on PR #5830: URL: https://github.com/apache/hudi/pull/5830#issuecomment-1360801005 @trushev Yes, this is what I intend to work on. What you described is operations made entirely on FlinkSQL. I was thinking of cross-engine operations. i.e. Tables tha

[jira] [Updated] (HUDI-4625) Clean up KafkaOffsetGen

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4625: - Fix Version/s: (was: 0.12.2) > Clean up KafkaOffsetGen > --- > > Key: HUDI

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4613: -- Priority: Critical (was: Blocker) > Avoid the use of regex expressions when call hoodieFileGrou

[jira] [Updated] (HUDI-4852) Incremental sync not updating pending file groups under clustering

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4852: - Fix Version/s: (was: 0.12.2) > Incremental sync not updating pending file groups under clustering > --

[jira] [Updated] (HUDI-4629) Create hive table from existing hoodie Table failed when the table schema is not defined

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4629: - Fix Version/s: (was: 0.12.2) > Create hive table from existing hoodie Table failed when the table schema is >

[jira] [Updated] (HUDI-5438) Benchmark calls w/ metadata enabled and ensure no calls to direct FS

2022-12-20 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5438: -- Fix Version/s: 0.13.0 > Benchmark calls w/ metadata enabled and ensure no calls to direc

[jira] [Assigned] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-5442: --- Assignee: Ethan Guo > Fix HiveHoodieTableFileIndex to use lazy listing >

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Fix Version/s: 0.13.0 > Fix HiveHoodieTableFileIndex to use lazy listing > -

[jira] [Created] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5442: --- Summary: Fix HiveHoodieTableFileIndex to use lazy listing Key: HUDI-5442 URL: https://issues.apache.org/jira/browse/HUDI-5442 Project: Apache Hudi Issue Type: Bug

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Component/s: reader-core trino-presto > Fix HiveHoodieTableFileIndex to use lazy listing >

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Story Points: 5 > Fix HiveHoodieTableFileIndex to use lazy listing > ---

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Sprint: 0.13.0 Final Sprint > Fix HiveHoodieTableFileIndex to use lazy listing > ---

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Priority: Blocker (was: Critical) > Fix HiveHoodieTableFileIndex to use lazy listing >

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5442: -- Priority: Critical (was: Major) > Fix HiveHoodieTableFileIndex to use lazy listing > --

[jira] [Assigned] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-3517: - Assignee: Sagar Sumit (was: sivabalan narayanan) > Unicode in partition path causes it t

[jira] [Updated] (HUDI-4876) DT archival is blocked by MDT compaction

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-4876: - Fix Version/s: (was: 0.12.2) > DT archival is blocked by MDT compaction >

[jira] [Updated] (HUDI-5078) When applying changes to MDT, any replace commit is considered a table service

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5078: - Fix Version/s: (was: 0.12.2) > When applying changes to MDT, any replace commit is considered a table service

[jira] [Assigned] (HUDI-4991) Make sure DeltaStreamer passes SSL key/truststore configs connecting to Schema Registry

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-4991: - Assignee: Jonathan Vexler (was: sivabalan narayanan) > Make sure DeltaStreamer passes SS

[jira] [Updated] (HUDI-5172) Handle empty or corrupted timeline files in data table while constructing valid instants for metadata table read

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5172: -- Priority: Critical (was: Blocker) > Handle empty or corrupted timeline files in data table whil

[jira] [Assigned] (HUDI-5172) Handle empty or corrupted timeline files in data table while constructing valid instants for metadata table read

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5172: - Assignee: sivabalan narayanan > Handle empty or corrupted timeline files in data table wh

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2022-12-20 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Description: Currently, HiveHoodieTableFileIndex hard-codes the shouldListLazily to false, using eager list

[jira] [Updated] (HUDI-3407) Make sure Restore operation is Not Concurrent w/ Writes in Multi-Writer scenario

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3407: -- Priority: Major (was: Blocker) > Make sure Restore operation is Not Concurrent w/ Writes in Mul

[jira] [Updated] (HUDI-3407) Make sure Restore operation is Not Concurrent w/ Writes in Multi-Writer scenario

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3407: -- Sprint: (was: 0.13.0 Final Sprint) > Make sure Restore operation is Not Concurrent w/ Writes i

[jira] [Assigned] (HUDI-5423) Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true)

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5423: - Assignee: Alexey Kudinkin > Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true) > --

[jira] [Updated] (HUDI-3407) Make sure Restore operation is Not Concurrent w/ Writes in Multi-Writer scenario

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3407: -- Sprint: 2023-01-09 > Make sure Restore operation is Not Concurrent w/ Writes in Multi-Writer >

[jira] [Assigned] (HUDI-5429) Investigate lot of head requests in MDT

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5429: - Assignee: Sagar Sumit > Investigate lot of head requests in MDT > ---

[jira] [Updated] (HUDI-5428) Investigate S3 connection leaks w/ MDT

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5428: -- Story Points: 4 (was: 3) > Investigate S3 connection leaks w/ MDT > --

[jira] [Assigned] (HUDI-5428) Investigate S3 connection leaks w/ MDT

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5428: - Assignee: Sagar Sumit > Investigate S3 connection leaks w/ MDT > ---

[GitHub] [hudi] xiarixiaoyao commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-20 Thread GitBox
xiarixiaoyao commented on PR #7480: URL: https://github.com/apache/hudi/pull/7480#issuecomment-1360811258 > > > @voonhous Maybe we need a parameter to control this feature, not all tables need to follow this logic > > > > > > Hmmm, CMIIW, Hudi has been relying on ASR for schema re

[jira] [Assigned] (HUDI-5430) Fix multi-writer handling w/ rollback blocks in MOR table (log record reader)

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5430: - Assignee: sivabalan narayanan > Fix multi-writer handling w/ rollback blocks in MOR table

[jira] [Updated] (HUDI-5017) Modify the logic of defaultMode in BootstrapRegexModeSelector

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5017: - Fix Version/s: (was: 0.12.2) > Modify the logic of defaultMode in BootstrapRegexModeSelector > ---

[jira] [Assigned] (HUDI-5432) Fix adding back a log block w/ same commit time as previously rolled back one

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5432: - Assignee: sivabalan narayanan > Fix adding back a log block w/ same commit time as previo

[GitHub] [hudi] xiarixiaoyao commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-20 Thread GitBox
xiarixiaoyao commented on PR #7480: URL: https://github.com/apache/hudi/pull/7480#issuecomment-1360813325 @voonhous pls rebase code, once ci pass,we can merge it . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[jira] [Assigned] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5433: - Assignee: sivabalan narayanan > Fix the way we deduce the pending instants for MDT writes

[jira] [Updated] (HUDI-5069) TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5069: - Fix Version/s: (was: 0.12.2) > TestInlineCompaction.testSuccessfulCompactionBasedOnNumAndTime is flaky > -

[jira] [Updated] (HUDI-5107) Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and StreamerUtil are not consistent issue

2022-12-20 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-5107: - Fix Version/s: (was: 0.12.2) > Fix hadoop config in DirectWriteMarkers, HoodieFlinkEngineContext and > Stream

[GitHub] [hudi] voonhous commented on pull request #7480: [HUDI-5400] Fix read issues when Hudi-FULL schema evolution is not enabled

2022-12-20 Thread GitBox
voonhous commented on PR #7480: URL: https://github.com/apache/hudi/pull/7480#issuecomment-1360814665 > @voonhous pls rebase code, once ci pass,we can merge it . Done! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[jira] [Updated] (HUDI-5238) Hudi throwing "PipeBroken" exception during Merging on GCS

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5238: -- Story Points: 1 (was: 4) > Hudi throwing "PipeBroken" exception during Merging on GCS > ---

[jira] [Updated] (HUDI-3204) Allow original partition column value to be retrieved when using TimestampBasedKeyGen

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3204: -- Sprint: (was: 0.13.0 Final Sprint) > Allow original partition column value to be retrieved whe

[jira] [Updated] (HUDI-5423) Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true)

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5423: -- Story Points: 2 (was: 3) > Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true) > -

[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5392: -- Story Points: 4 (was: 8) > Fix Bootstrap files reader to configure arrays to be read in the new

[jira] [Updated] (HUDI-3204) Allow original partition column value to be retrieved when using TimestampBasedKeyGen

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3204: -- Sprint: 2023-01-09 > Allow original partition column value to be retrieved when using > Timesta

[GitHub] [hudi] hudi-bot commented on pull request #4966: [HUDI-3572]support DAY_ROLLING strategy in ClusteringPlanPartitionFilterMode

2022-12-20 Thread GitBox
hudi-bot commented on PR #4966: URL: https://github.com/apache/hudi/pull/4966#issuecomment-1360830585 ## CI report: * 0883581b2202f3a0389749d09dbfa0a79f158da8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6786

[jira] [Updated] (HUDI-4503) Support table identifier with explicit catalog

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4503: -- Story Points: 1 (was: 2) > Support table identifier with explicit catalog > ---

[jira] [Updated] (HUDI-4690) Remove code duplicated over from Spark

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4690: -- Story Points: 4 (was: 12) > Remove code duplicated over from Spark > --

[jira] [Updated] (HUDI-4489) Break down HoodieAnalysis rules into Spark-specific components

2022-12-20 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4489: -- Story Points: 1 (was: 4) > Break down HoodieAnalysis rules into Spark-specific components > ---

<    1   2   3   4   5   6   >