[jira] [Updated] (HUDI-5498) Update docs for reading Hudi tables on Databricks runtime

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5498: Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2,

[jira] [Updated] (HUDI-4991) Make sure DeltaStreamer passes SSL key/truststore configs connecting to Schema Registry

2023-01-17 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-4991: -- Story Points: 1 (was: 2) > Make sure DeltaStreamer passes SSL key/truststore configs

[jira] [Updated] (HUDI-3673) Add a common hudi-hbase-shaded for shaded hbase dependencies

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3673: - Story Points: 0 (was: 1) > Add a common hudi-hbase-shaded for shaded hbase dependencies >

[jira] [Updated] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4586: -- Story Points: 3 (was: 6) > Address S3 timeouts in Bloom Index with metadata table >

[jira] [Updated] (HUDI-5238) Hudi throwing "PipeBroken" exception during Merging on GCS

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5238: -- Priority: Major (was: Blocker) > Hudi throwing "PipeBroken" exception during Merging on GCS >

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5323: Story Points: 0.5 (was: 2) > Decouple virtual key with writing bloom filters to parquet files >

[jira] [Updated] (HUDI-5238) Hudi throwing "PipeBroken" exception during Merging on GCS

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5238: -- Priority: Critical (was: Major) > Hudi throwing "PipeBroken" exception during Merging on GCS >

[jira] [Updated] (HUDI-5276) Hudi getAllQueryPartitionPaths use regular match caused Invalid input path add

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5276: Story Points: 1 (was: 3) > Hudi getAllQueryPartitionPaths use regular match caused Invalid input path >

[jira] [Updated] (HUDI-5319) NPE in Bloom Filter Index

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5319: Story Points: 0 (was: 1) > NPE in Bloom Filter Index > - > > Key:

[jira] [Updated] (HUDI-5520) Fail MDT when list of log files grows unboundedly

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5520: -- Story Points: 1 (was: 3) > Fail MDT when list of log files grows unboundedly >

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Story Points: 0.5 (was: 2) > Improve performance of savepoint with MDT >

[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-3636: Story Points: 0 (was: 1) > Clustering fails due to marker creation failure >

[jira] [Assigned] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5464: - Assignee: Raymond Xu (was: Alexey Kudinkin) > Fix instantiation of a new

[jira] [Updated] (HUDI-4937) Fix HoodieTable injecting HoodieBackedTableMetadata not reusing underlying MT readers

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4937: -- Story Points: 2 (was: 4) > Fix HoodieTable injecting HoodieBackedTableMetadata not reusing

[jira] [Updated] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5464: -- Reporter: sivabalan narayanan (was: Alexey Kudinkin) > Fix instantiation of a new partition in

[GitHub] [hudi] LinMingQiang opened a new issue, #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
LinMingQiang opened a new issue, #7691: URL: https://github.com/apache/hudi/issues/7691 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1073032227 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void

[GitHub] [hudi] hudi-bot commented on pull request #7612: [HUDI-5336] Fixing log file pattern match to ignore extraneous files

2023-01-17 Thread GitBox
hudi-bot commented on PR #7612: URL: https://github.com/apache/hudi/pull/7612#issuecomment-1386378924 ## CI report: * 66370e1d4085619050625bf32e08dc9c8cef8f76 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386377763 ## CI report: * 0025243644c03672360497938474031048a254cf Azure:

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1073025366 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1073025051 ## hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataLogRecordReader.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] hudi-bot commented on pull request #7690: [HUDI-5485] Add File System View API for batch listing and improve savepoint performance with metadata table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7690: URL: https://github.com/apache/hudi/pull/7690#issuecomment-1386373072 ## CI report: * ca9fb1e21c08ac0eb7dc6305934f1c58803070e3 Azure:

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1073024689 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -108,30 +116,94 @@ protected

[GitHub] [hudi] hudi-bot commented on pull request #7612: [HUDI-5336] Fixing log file pattern match to ignore extraneous files

2023-01-17 Thread GitBox
hudi-bot commented on PR #7612: URL: https://github.com/apache/hudi/pull/7612#issuecomment-1386372802 ## CI report: * 66370e1d4085619050625bf32e08dc9c8cef8f76 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386371867 ## CI report: * 0025243644c03672360497938474031048a254cf Azure:

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Sprint: 0.13.0 Final Sprint 3 > Write tests for failed compaction retried w/ MDT able

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Epic Link: HUDI-1292 > Write tests for failed compaction retried w/ MDT able to serve

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Story Points: 2 > Write tests for failed compaction retried w/ MDT able to serve just

[jira] [Updated] (HUDI-3775) Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3775: -- Story Points: 0 (was: 1) > Allow for offline compaction of MOR tables via spark

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Fix Version/s: 0.13.0 > Write tests for failed compaction retried w/ MDT able to serve

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Priority: Blocker (was: Major) > Write tests for failed compaction retried w/ MDT able

[jira] [Created] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5570: - Summary: Write tests for failed compaction retried w/ MDT able to serve just the required data Key: HUDI-5570 URL: https://issues.apache.org/jira/browse/HUDI-5570

[jira] [Assigned] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5570: - Assignee: sivabalan narayanan > Write tests for failed compaction retried w/ MDT

[jira] [Updated] (HUDI-4911) Make sure LogRecordReader doesn't flush the cache before each lookup

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4911: -- Story Points: 1 (was: 4) > Make sure LogRecordReader doesn't flush the cache before

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5408: -- Story Points: 0 (was: 1) > Partially failed commits in MDT have to be rolled back in

[jira] [Updated] (HUDI-5407) Rollbacks in MDT is not effective

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5407: -- Story Points: 0 (was: 1) > Rollbacks in MDT is not effective >

[jira] [Updated] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5433: -- Story Points: 0 (was: 1) > Fix the way we deduce the pending instants for MDT writes >

[jira] [Updated] (HUDI-5463) Apply rollback commits from data table as rollbacks in MDT instead of Delta commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5463: -- Sprint: 0.13.0 Final Sprint (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 3) > Apply

[GitHub] [hudi] hudi-bot commented on pull request #7690: [HUDI-5485] Add File System View API for batch listing and improve savepoint performance with metadata table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7690: URL: https://github.com/apache/hudi/pull/7690#issuecomment-1386365026 ## CI report: * ca9fb1e21c08ac0eb7dc6305934f1c58803070e3 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-5536) Support writing to hudi w/o any options

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5536: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Support writing to

[jira] [Updated] (HUDI-5537) Support partitionBy with dataframe apis

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5537: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Support partitionBy

[jira] [Updated] (HUDI-5475) not able to generate utilities-slim bundle dependency tree

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5475: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5238) Hudi throwing "PipeBroken" exception during Merging on GCS

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5238: - Sprint: 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final

[jira] [Updated] (HUDI-5516) Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5516: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Reduce memory

[jira] [Updated] (HUDI-5534) Optimize Bloom Index lookup DAG

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5534: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Optimize Bloom

[jira] [Updated] (HUDI-5401) Hivemetastore URI set in hudi conf not respected.

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5401: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5384) Make sure predicates are appropriately pushed down to HoodieFileIndex when lazy listing

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5384: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/12/12,

[jira] [Updated] (HUDI-5535) Add support for keyless for all keygens(non partitioned, timestamp based key gen)

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5535: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Add support for

[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3636: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/29, 2022/12/12,

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5569: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Files written by

[jira] [Updated] (HUDI-2608) Support JSON schema in schema registry provider

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2608: - Sprint: 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was:

[jira] [Updated] (HUDI-3775) Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3775: - Sprint: 2022/09/05, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/09/05,

[jira] [Updated] (HUDI-5559) Support CDC for flink bounded source

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5559: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Support CDC for

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5552: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Too slow while

[jira] [Updated] (HUDI-5276) Hudi getAllQueryPartitionPaths use regular match caused Invalid input path add

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5276: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5555) Set class loader for parquet data block

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Set class loader

[jira] [Updated] (HUDI-5498) Update docs for reading Hudi tables on Databricks runtime

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5498: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5443) Fix exception when querying MOR table after applying NestedSchemaPruning optimization

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5443: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5499) Make sure CTAS always uses Bulk Insert

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5499: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-3249) Performance Improvements

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3249: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-4700) RFC for primary key-less data model

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4700: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > RFC for primary

[jira] [Updated] (HUDI-2681) Make hoodie record_key and preCombine_key optional

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2681: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Make hoodie

[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-4991) Make sure DeltaStreamer passes SSL key/truststore configs connecting to Schema Registry

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4991: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2,

[jira] [Updated] (HUDI-5321) Fix Bulk Insert ColumnSortPartitioners

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5321: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/12/12,

[jira] [Updated] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-83: --- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint, 0.13.0

[jira] [Updated] (HUDI-4701) Support bulk insert without primary key and precombine field

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4701: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint 2) > Support bulk insert

[jira] [Updated] (HUDI-5503) Optimize flink table factory option check

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5503: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-4911) Make sure LogRecordReader doesn't flush the cache before each lookup

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4911: - Sprint: 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final

[jira] [Updated] (HUDI-5407) Rollbacks in MDT is not effective

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5407: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5485: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1574: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5323: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/12/12,

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4613: - Sprint: 2022/09/05, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was:

[jira] [Updated] (HUDI-5160) Spark df saveAsTable failed with CTAS

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5160: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-3529) Improve dependency management and bundling

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3529: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5392: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/12/12,

[jira] [Updated] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4586: - Sprint: 2022/08/08, 2022/08/22, 2022/09/05, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final

[jira] [Updated] (HUDI-5520) Fail MDT when list of log files grows unboundedly

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5520: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-3673) Add a common hudi-hbase-shaded for shaded hbase dependencies

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3673: - Sprint: 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was:

[jira] [Updated] (HUDI-5319) NPE in Bloom Filter Index

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5319: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/12/12,

[jira] [Updated] (HUDI-5075) Add support to rollback residual clustering after disabling clustering

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5075: - Sprint: 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final

[jira] [Updated] (HUDI-5352) Jackson fails to serialize LocalDate when updating Delta Commit metadata

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5352: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 2022/12/12,

[jira] [Updated] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5433: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-3601) Support multi-arch builds in docker setup

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3601: - Sprint: 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12,

[jira] [Updated] (HUDI-3967) Automatic savepoint in Hudi

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3967: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-4937) Fix HoodieTable injecting HoodieBackedTableMetadata not reusing underlying MT readers

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4937: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5408: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5464: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint,

[jira] [Updated] (HUDI-5463) Apply rollback commits from data table as rollbacks in MDT instead of Delta commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5463: -- Sprint: 0.13.0 Final Sprint (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2) > Apply

[jira] [Updated] (HUDI-5463) Apply rollback commits from data table as rollbacks in MDT instead of Delta commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5463: -- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 3 (was: 0.13.0 Final Sprint) > Apply

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386356546 ## CI report: * 08642ac9be198fdf55f02260253f81a0b457bcad Azure:

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Status: Patch Available (was: In Progress) > Improve performance of savepoint with MDT >

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5323: Status: Patch Available (was: In Progress) > Decouple virtual key with writing bloom filters to parquet

[jira] [Updated] (HUDI-5319) NPE in Bloom Filter Index

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5319: Status: Patch Available (was: In Progress) > NPE in Bloom Filter Index > - > >

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5485: - Labels: pull-request-available (was: ) > Improve performance of savepoint with MDT >

[GitHub] [hudi] yihua opened a new pull request, #7690: [HUDI-5485] Add File System View API for batch listing and improve savepoint performance with metadata table

2023-01-17 Thread GitBox
yihua opened a new pull request, #7690: URL: https://github.com/apache/hudi/pull/7690 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[GitHub] [hudi] With-winds opened a new issue, #7689: [SUPPORT] PriorityBasedFileSystemView: Got error running preferred function. Trying secondary

2023-01-17 Thread GitBox
With-winds opened a new issue, #7689: URL: https://github.com/apache/hudi/issues/7689 **Describe the problem you faced** When trying to write to existing COW table using HoodieDeltaStreamer, an error occurred in the Java Spark application. **To Reproduce** **Expected

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7582: URL: https://github.com/apache/hudi/pull/7582#discussion_r1072998596 ## hudi-common/src/main/java/org/apache/hudi/common/util/queue/DisruptorMessageQueue.java: ## @@ -60,6 +61,10 @@ public long size() { @Override public void

<    1   2   3   >