[GitHub] [hudi] Junyewu commented on issue #7417: [SUPPORT] With HoodieROTablePathFilter is too slow load normal parquets in hudi release

2022-12-08 Thread GitBox
Junyewu commented on issue #7417: URL: https://github.com/apache/hudi/issues/7417#issuecomment-1343970991 @zhangyue19921010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] Junyewu opened a new issue, #7417: [SUPPORT] With HoodieROTablePathFilter is too slow load normal parquets in hudi release

2022-12-08 Thread GitBox
Junyewu opened a new issue, #7417: URL: https://github.com/apache/hudi/issues/7417 **Describe the problem you faced** with the HoodieROTablePathFilter load normal parquet file, it will be too slow when reaches a certain order of magnitude For example:500 partitions and 50

[GitHub] [hudi] fuyun2024 closed pull request #7301: [HUDI-5309] Support for Spark to automatically enable schema evolution when reading the Hoodie table

2022-12-08 Thread GitBox
fuyun2024 closed pull request #7301: [HUDI-5309] Support for Spark to automatically enable schema evolution when reading the Hoodie table URL: https://github.com/apache/hudi/pull/7301 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] fuyun2024 commented on pull request #7301: [HUDI-5309] Support for Spark to automatically enable schema evolution when reading the Hoodie table

2022-12-08 Thread GitBox
fuyun2024 commented on PR #7301: URL: https://github.com/apache/hudi/pull/7301#issuecomment-1343963814 Thanks for your time, I will close this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Commented] (HUDI-5350) oom cause compaction event lost

2022-12-08 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645128#comment-17645128 ] Danny Chen commented on HUDI-5350: -- Fixed via master branch: 115584c46e30998e0369b0e5550c

[GitHub] [hudi] hudi-bot commented on pull request #7415: [HUDI-5355] support spark 3.2+ hudi table query with tvf

2022-12-08 Thread GitBox
hudi-bot commented on PR #7415: URL: https://github.com/apache/hudi/pull/7415#issuecomment-1343949288 ## CI report: * 72888c00a744f22f2072418b72960db508bcce8a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[jira] [Resolved] (HUDI-5350) oom cause compaction event lost

2022-12-08 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen resolved HUDI-5350. -- > oom cause compaction event lost > --- > > Key: HUDI-5350 >

[jira] [Updated] (HUDI-5350) oom cause compaction event lost

2022-12-08 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-5350: - Fix Version/s: 0.12.2 0.13.0 > oom cause compaction event lost > --

[hudi] branch master updated (8de53571e0 -> 115584c46e)

2022-12-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 8de53571e0 [HUDI-5346][HUDI-5320] Fixing Create Table as Select (CTAS) performance gaps (#7370) add 115584c46e

[GitHub] [hudi] danny0405 merged pull request #7408: [HUDI-5350] Fix oom cause compaction event lost problem

2022-12-08 Thread GitBox
danny0405 merged PR #7408: URL: https://github.com/apache/hudi/pull/7408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] leesf commented on pull request #5913: [HUDI-4287] Optimize Flink checkpoint meta mechanism to fix mistaken pending instants

2022-12-08 Thread GitBox
leesf commented on PR #5913: URL: https://github.com/apache/hudi/pull/5913#issuecomment-1343914467 @chenshzh would you please rebase to latest master first? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [hudi] boneanxs commented on pull request #6725: [HUDI-4881] Push down filters if possible when syncing partitions to Hive

2022-12-08 Thread GitBox
boneanxs commented on PR #6725: URL: https://github.com/apache/hudi/pull/6725#issuecomment-1343905647 @xushiyan gentle ping... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [hudi] scxwhite commented on pull request #7182: [HUDI-5196]spark sql 3.2+ query support

2022-12-08 Thread GitBox
scxwhite commented on PR #7182: URL: https://github.com/apache/hudi/pull/7182#issuecomment-1343899979 @YannByron @XuQianJin-Stars query with tvf has been done. https://github.com/apache/hudi/pull/7415。 I plan to close this pr. -- This is an automated message from the Apache Git Servic

[GitHub] [hudi] hudi-bot commented on pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
hudi-bot commented on PR #7416: URL: https://github.com/apache/hudi/pull/7416#issuecomment-1343898250 ## CI report: * 447a28c3d7697ca84ed96db596e32260a68ebf77 UNKNOWN * a2ac967f1c3b53fcc60e8ba195bb13cd5b5dcbf6 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-

[GitHub] [hudi] hudi-bot commented on pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
hudi-bot commented on PR #7416: URL: https://github.com/apache/hudi/pull/7416#issuecomment-1343892843 ## CI report: * 447a28c3d7697ca84ed96db596e32260a68ebf77 UNKNOWN * a2ac967f1c3b53fcc60e8ba195bb13cd5b5dcbf6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[jira] [Updated] (HUDI-4411) Bump Spark version to 3.2.3

2022-12-08 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4411: - Summary: Bump Spark version to 3.2.3 (was: Bump Spark version to 3.2.2) > Bump Spark version to 3.2.3 > -

[GitHub] [hudi] hudi-bot commented on pull request #7398: [HUDI-4961] Support optional table synchronization to hive.

2022-12-08 Thread GitBox
hudi-bot commented on PR #7398: URL: https://github.com/apache/hudi/pull/7398#issuecomment-1343887621 ## CI report: * 6e8e015b4afa628ec067dba725769e952915830c Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] xushiyan commented on a diff in pull request #7037: [HUDI-5078] Fixing determination of table service for metadata calls

2022-12-08 Thread GitBox
xushiyan commented on code in PR #7037: URL: https://github.com/apache/hudi/pull/7037#discussion_r1044099173 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java: ## @@ -67,7 +67,7 @@ public enum State { // Committed instant COMPLETED,

[GitHub] [hudi] hudi-bot commented on pull request #7327: [HUDI-3088] Use Spark 3.2 as default Spark version

2022-12-08 Thread GitBox
hudi-bot commented on PR #7327: URL: https://github.com/apache/hudi/pull/7327#issuecomment-1343847543 ## CI report: * 152deec6f09ef9c531a9e5b2772785aaa476397a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] hudi-bot commented on pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
hudi-bot commented on PR #7416: URL: https://github.com/apache/hudi/pull/7416#issuecomment-1343847708 ## CI report: * 447a28c3d7697ca84ed96db596e32260a68ebf77 UNKNOWN * a2ac967f1c3b53fcc60e8ba195bb13cd5b5dcbf6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
hudi-bot commented on PR #7416: URL: https://github.com/apache/hudi/pull/7416#issuecomment-1343844921 ## CI report: * 447a28c3d7697ca84ed96db596e32260a68ebf77 UNKNOWN * a2ac967f1c3b53fcc60e8ba195bb13cd5b5dcbf6 UNKNOWN Bot commands @hudi-bot supports the following

[GitHub] [hudi] hudi-bot commented on pull request #7415: [HUDI-5355] support spark 3.2+ hudi table query with tvf

2022-12-08 Thread GitBox
hudi-bot commented on PR #7415: URL: https://github.com/apache/hudi/pull/7415#issuecomment-1343844907 ## CI report: * 72888c00a744f22f2072418b72960db508bcce8a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] hudi-bot commented on pull request #7327: [HUDI-3088] Use Spark 3.2 as default Spark version

2022-12-08 Thread GitBox
hudi-bot commented on PR #7327: URL: https://github.com/apache/hudi/pull/7327#issuecomment-1343844763 ## CI report: * 152deec6f09ef9c531a9e5b2772785aaa476397a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] codope commented on a diff in pull request #7412: [HUDI-5353] Close file readers

2022-12-08 Thread GitBox
codope commented on code in PR #7412: URL: https://github.com/apache/hudi/pull/7412#discussion_r1044085391 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java: ## @@ -54,12 +55,15 @@ public abstract class BaseMergeHelper {

[GitHub] [hudi] hudi-bot commented on pull request #7415: [HUDI-5355] support spark 3.2+ hudi table query with tvf

2022-12-08 Thread GitBox
hudi-bot commented on PR #7415: URL: https://github.com/apache/hudi/pull/7415#issuecomment-1343841965 ## CI report: * 72888c00a744f22f2072418b72960db508bcce8a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
hudi-bot commented on PR #7416: URL: https://github.com/apache/hudi/pull/7416#issuecomment-1343841990 ## CI report: * 447a28c3d7697ca84ed96db596e32260a68ebf77 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7327: [HUDI-3088] Use Spark 3.2 as default Spark version

2022-12-08 Thread GitBox
hudi-bot commented on PR #7327: URL: https://github.com/apache/hudi/pull/7327#issuecomment-1343838887 ## CI report: * 152deec6f09ef9c531a9e5b2772785aaa476397a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] hudi-bot commented on pull request #7183: [HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager

2022-12-08 Thread GitBox
hudi-bot commented on PR #7183: URL: https://github.com/apache/hudi/pull/7183#issuecomment-1343838758 ## CI report: * 3cb5b943af00e00bea80e372c294b8842d67f0e3 UNKNOWN * 9045b7187198165f0eafdbb5cd0a70f85b5e6311 UNKNOWN * 0265ffa366d029603aae7780e7ecf64402bcc7ec UNKNOWN * e4

[GitHub] [hudi] xushiyan commented on a diff in pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
xushiyan commented on code in PR #7416: URL: https://github.com/apache/hudi/pull/7416#discussion_r1044070058 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -35,6 +36,7 @@ public class JsonUtils { private static final ObjectMapper MAPPER = new O

[GitHub] [hudi] chenshzh commented on a diff in pull request #5950: [HUDI-4311] Fix Flink lose data on some rollback scene

2022-12-08 Thread GitBox
chenshzh commented on code in PR #5950: URL: https://github.com/apache/hudi/pull/5950#discussion_r1044074034 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/meta/CkpMetadata.java: ## @@ -97,8 +97,6 @@ public void close() { public void bootstrap(HoodieTa

[GitHub] [hudi] xushiyan commented on a diff in pull request #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
xushiyan commented on code in PR #7416: URL: https://github.com/apache/hudi/pull/7416#discussion_r1044070058 ## hudi-common/src/main/java/org/apache/hudi/common/util/JsonUtils.java: ## @@ -35,6 +36,7 @@ public class JsonUtils { private static final ObjectMapper MAPPER = new O

[jira] [Updated] (HUDI-5352) Jackson fails to serialize LocalDate when updating Delta Commit metadata

2022-12-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5352: - Labels: pull-request-available (was: ) > Jackson fails to serialize LocalDate when updating Delta

[GitHub] [hudi] xushiyan opened a new pull request, #7416: [HUDI-5352] Fix `LocalDate` serialization in colstats

2022-12-08 Thread GitBox
xushiyan opened a new pull request, #7416: URL: https://github.com/apache/hudi/pull/7416 ### Change Logs Under spark3.3 profile, when colstats is configured for Date type column, serialization will break with error in fasterxml complaining missing java time module support. Als

[jira] [Updated] (HUDI-5355) support spark 3.2+ hudi table query with tvf

2022-12-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5355: - Labels: pull-request-available (was: ) > support spark 3.2+ hudi table query with tvf > -

[GitHub] [hudi] scxwhite opened a new pull request, #7415: [HUDI-5355] support spark 3.2+ hudi table query with tvf

2022-12-08 Thread GitBox
scxwhite opened a new pull request, #7415: URL: https://github.com/apache/hudi/pull/7415 ### Change Logs Add a new alue function to support the query in the hudi table in kv mode ### Impact new spark query method. ### Risk level (write none, low medium or high bel

[jira] [Created] (HUDI-5355) support spark 3.2+ hudi table query with tvf

2022-12-08 Thread scx (Jira)
scx created HUDI-5355: - Summary: support spark 3.2+ hudi table query with tvf Key: HUDI-5355 URL: https://issues.apache.org/jira/browse/HUDI-5355 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] weimingdiit commented on pull request #5716: [HUDI-4167] Remove the timeline refresh with initializing hoodie table

2022-12-08 Thread GitBox
weimingdiit commented on PR #5716: URL: https://github.com/apache/hudi/pull/5716#issuecomment-1343802172 【But, let's see the construction, the meta client is instantiated freshly so the timeline is already the latest, the table is also constructed freshly】 Can you explain this in deta

[GitHub] [hudi] hudi-bot commented on pull request #7245: [HUDI-5238] Fixing `HoodieMergeHandle` shutdown sequence

2022-12-08 Thread GitBox
hudi-bot commented on PR #7245: URL: https://github.com/apache/hudi/pull/7245#issuecomment-1343802063 ## CI report: * ea43b43f80edfe24a0c3f9507018b2c7b8e2ea09 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] hudi-bot commented on pull request #7408: [HUDI-5350]fix oom cause compaction event lost problem.

2022-12-08 Thread GitBox
hudi-bot commented on PR #7408: URL: https://github.com/apache/hudi/pull/7408#issuecomment-1343799419 ## CI report: * 53bde66487d957954bb83a2db2d7820e47dc0c25 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1355

[GitHub] [hudi] hudi-bot commented on pull request #7398: [HUDI-4961] Support optional table synchronization to hive.

2022-12-08 Thread GitBox
hudi-bot commented on PR #7398: URL: https://github.com/apache/hudi/pull/7398#issuecomment-1343799395 ## CI report: * 39936496b812314ba000c1e319a09b753446f746 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1354

[GitHub] [hudi] hudi-bot commented on pull request #7408: [HUDI-5350]fix oom cause compaction event lost problem.

2022-12-08 Thread GitBox
hudi-bot commented on PR #7408: URL: https://github.com/apache/hudi/pull/7408#issuecomment-1343794440 ## CI report: * 53bde66487d957954bb83a2db2d7820e47dc0c25 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1355

[GitHub] [hudi] hudi-bot commented on pull request #7398: [HUDI-4961] Support optional table synchronization to hive.

2022-12-08 Thread GitBox
hudi-bot commented on PR #7398: URL: https://github.com/apache/hudi/pull/7398#issuecomment-1343794288 ## CI report: * 39936496b812314ba000c1e319a09b753446f746 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1354

[GitHub] [hudi] scxwhite commented on issue #7392: [SUPPORT] Unable to read data from MOR table using spark. ERROR: org.apache.spark.sql.execution.datasources.PartitionedFile

2022-12-08 Thread GitBox
scxwhite commented on issue #7392: URL: https://github.com/apache/hudi/issues/7392#issuecomment-1343786075 You can use the maven shade class file. ``` org.apache.spark.sql.execution.datasources.PartitionedFile

[GitHub] [hudi] leesf commented on a diff in pull request #5950: [HUDI-4311] Fix Flink lose data on some rollback scene

2022-12-08 Thread GitBox
leesf commented on code in PR #5950: URL: https://github.com/apache/hudi/pull/5950#discussion_r1044049689 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/meta/CkpMetadata.java: ## @@ -97,8 +97,6 @@ public void close() { public void bootstrap(HoodieTable

[hudi] branch master updated: [HUDI-5346][HUDI-5320] Fixing Create Table as Select (CTAS) performance gaps (#7370)

2022-12-08 Thread akudinkin
This is an automated email from the ASF dual-hosted git repository. akudinkin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8de53571e0 [HUDI-5346][HUDI-5320] Fixing Create

[GitHub] [hudi] alexeykudinkin merged pull request #7370: [HUDI-5346][HUDI-5320] Fixing Create Table as Select (CTAS) performance gaps

2022-12-08 Thread GitBox
alexeykudinkin merged PR #7370: URL: https://github.com/apache/hudi/pull/7370 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.a

[GitHub] [hudi] alexeykudinkin commented on pull request #7370: [HUDI-5346][HUDI-5320] Fixing Create Table as Select (CTAS) performance gaps

2022-12-08 Thread GitBox
alexeykudinkin commented on PR #7370: URL: https://github.com/apache/hudi/pull/7370#issuecomment-1343774135 CI is green: https://user-images.githubusercontent.com/428277/206613474-e65cf132-978b-4ebc-96b3-2268792ea163.png";> https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_b

[GitHub] [hudi] waywtdcc opened a new issue, #7414: [SUPPORT] Support lsm tree writing

2022-12-08 Thread GitBox
waywtdcc opened a new issue, #7414: URL: https://github.com/apache/hudi/issues/7414 Support lsm tree writing. You can configure hudi to support lsm tree writing. 1. I think writing in lsm mode can improve writing efficiency in some scenarios 2. The lsm tree is automatically sorted a

[GitHub] [hudi] hbgstc123 commented on a diff in pull request #7408: [HUDI-5350]fix oom cause compaction event lost problem.

2022-12-08 Thread GitBox
hbgstc123 commented on code in PR #7408: URL: https://github.com/apache/hudi/pull/7408#discussion_r1044035426 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/NonThrownExecutor.java: ## @@ -136,15 +136,15 @@ private Runnable wrapAction( } pri

[GitHub] [hudi] hudi-bot commented on pull request #7327: [HUDI-3088] Use Spark 3.2 as default Spark version

2022-12-08 Thread GitBox
hudi-bot commented on PR #7327: URL: https://github.com/apache/hudi/pull/7327#issuecomment-1343736671 ## CI report: * e5be418a0979e12026ff9b55b52a4d524e96e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1355

[GitHub] [hudi] hudi-bot commented on pull request #7245: [HUDI-5238] Fixing `HoodieMergeHandle` shutdown sequence

2022-12-08 Thread GitBox
hudi-bot commented on PR #7245: URL: https://github.com/apache/hudi/pull/7245#issuecomment-1343736567 ## CI report: * 2c234afd5510423a0f3858eb05c78299e4acb3a9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1353

[GitHub] [hudi] hudi-bot commented on pull request #7327: [HUDI-3088] Use Spark 3.2 as default Spark version

2022-12-08 Thread GitBox
hudi-bot commented on PR #7327: URL: https://github.com/apache/hudi/pull/7327#issuecomment-1343732766 ## CI report: * e5be418a0979e12026ff9b55b52a4d524e96e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1355

[GitHub] [hudi] hudi-bot commented on pull request #7245: [HUDI-5238] Fixing `HoodieMergeHandle` shutdown sequence

2022-12-08 Thread GitBox
hudi-bot commented on PR #7245: URL: https://github.com/apache/hudi/pull/7245#issuecomment-1343732690 ## CI report: * 2c234afd5510423a0f3858eb05c78299e4acb3a9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1353

[GitHub] [hudi] hudi-bot commented on pull request #7413: [HUDI-5321] correctly implement arePartitionRecordsSorted for bulk insert ColumnSortPartitioners

2022-12-08 Thread GitBox
hudi-bot commented on PR #7413: URL: https://github.com/apache/hudi/pull/7413#issuecomment-1343728274 ## CI report: * e78709ef1f7cc35893b09c8e9ade5533d59a77cf Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] hudi-bot commented on pull request #7183: [HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager

2022-12-08 Thread GitBox
hudi-bot commented on PR #7183: URL: https://github.com/apache/hudi/pull/7183#issuecomment-1343727914 ## CI report: * 3cb5b943af00e00bea80e372c294b8842d67f0e3 UNKNOWN * 9045b7187198165f0eafdbb5cd0a70f85b5e6311 UNKNOWN * 0265ffa366d029603aae7780e7ecf64402bcc7ec UNKNOWN * e4

[GitHub] [hudi] xiarixiaoyao commented on pull request #7183: [HUDI-5194] Fix schema files cleaning by FileBasedInternalSchemaStorageManager

2022-12-08 Thread GitBox
xiarixiaoyao commented on PR #7183: URL: https://github.com/apache/hudi/pull/7183#issuecomment-1343725211 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] weimingdiit commented on a diff in pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2022-12-08 Thread GitBox
weimingdiit commented on code in PR #7362: URL: https://github.com/apache/hudi/pull/7362#discussion_r1043981502 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java: ## @@ -179,6 +179,20 @@ public class HoodieCompactionConfig extends

[GitHub] [hudi] danny0405 commented on a diff in pull request #7408: [HUDI-5350]fix oom cause compaction event lost problem.

2022-12-08 Thread GitBox
danny0405 commented on code in PR #7408: URL: https://github.com/apache/hudi/pull/7408#discussion_r1043978608 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/utils/NonThrownExecutor.java: ## @@ -136,15 +136,15 @@ private Runnable wrapAction( } pri

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7413: [HUDI-5321] correctly implement arePartitionRecordsSorted for bulk insert ColumnSortPartitioners

2022-12-08 Thread GitBox
alexeykudinkin commented on code in PR #7413: URL: https://github.com/apache/hudi/pull/7413#discussion_r1043972511 ## hudi-client/hudi-java-client/src/main/java/org/apache/hudi/execution/bulkinsert/JavaCustomColumnsSortPartitioner.java: ## @@ -40,8 +42,11 @@ private final Str

[GitHub] [hudi] danny0405 commented on a diff in pull request #7412: [HUDI-5353] Close file readers

2022-12-08 Thread GitBox
danny0405 commented on code in PR #7412: URL: https://github.com/apache/hudi/pull/7412#discussion_r1043976402 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/clustering/ClusteringOperator.java: ## @@ -312,8 +321,8 @@ private Iterator readRecordsForGroupWi

[GitHub] [hudi] danny0405 commented on a diff in pull request #7412: [HUDI-5353] Close file readers

2022-12-08 Thread GitBox
danny0405 commented on code in PR #7412: URL: https://github.com/apache/hudi/pull/7412#discussion_r1043975952 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseMergeHelper.java: ## @@ -54,12 +55,15 @@ public abstract class BaseMergeHelper {

[GitHub] [hudi] danny0405 commented on a diff in pull request #7362: [HUDI-5315] The record size is dynamically estimated when the table i…

2022-12-08 Thread GitBox
danny0405 commented on code in PR #7362: URL: https://github.com/apache/hudi/pull/7362#discussion_r1043974212 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java: ## @@ -179,6 +179,20 @@ public class HoodieCompactionConfig extends H

[GitHub] [hudi] danny0405 commented on a diff in pull request #5950: [HUDI-4311] Fix Flink lose data on some rollback scene

2022-12-08 Thread GitBox
danny0405 commented on code in PR #5950: URL: https://github.com/apache/hudi/pull/5950#discussion_r1043972772 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/meta/CkpMetadata.java: ## @@ -97,8 +97,6 @@ public void close() { public void bootstrap(HoodieT

[GitHub] [hudi] leesf commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2022-12-08 Thread GitBox
leesf commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1343707001 > we have two operations relating to insert_overwrite. 1: insert_overwrite_table 2: insert_overwrite. > > spark-ds writes supports both operations. insert_overwrite_table will override

[jira] [Updated] (HUDI-5261) Use proper parallelism for engine context APIs

2022-12-08 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-5261: -- Status: In Progress (was: Open) > Use proper parallelism for engine context APIs >

[jira] [Commented] (HUDI-5261) Use proper parallelism for engine context APIs

2022-12-08 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645041#comment-17645041 ] Jonathan Vexler commented on HUDI-5261: --- TimelineServerPerf has numExecuters with a

[jira] [Commented] (HUDI-5261) Use proper parallelism for engine context APIs

2022-12-08 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645037#comment-17645037 ] Jonathan Vexler commented on HUDI-5261: --- FileSystemBackedTableMetadata has config  {

[jira] [Commented] (HUDI-5261) Use proper parallelism for engine context APIs

2022-12-08 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645032#comment-17645032 ] Jonathan Vexler commented on HUDI-5261: --- I see in this guide [https://spark.apache.

[GitHub] [hudi] hudi-bot commented on pull request #7413: [HUDI-5321] correctly implement arePartitionRecordsSorted for bulk insert ColumnSortPartitioners

2022-12-08 Thread GitBox
hudi-bot commented on PR #7413: URL: https://github.com/apache/hudi/pull/7413#issuecomment-1343497587 ## CI report: * e78709ef1f7cc35893b09c8e9ade5533d59a77cf Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1356

[GitHub] [hudi] hudi-bot commented on pull request #7370: [HUDI-5346][HUDI-5320] Fixing Create Table as Select (CTAS) performance gaps

2022-12-08 Thread GitBox
hudi-bot commented on PR #7370: URL: https://github.com/apache/hudi/pull/7370#issuecomment-1343497363 ## CI report: * ac0714640c13e6538e2b553e2da97665217ea31f UNKNOWN * 5216f66efded3dc7e44a54951460fd18a112cd0e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7413: [HUDI-5321] correctly implement arePartitionRecordsSorted for bulk insert ColumnSortPartitioners

2022-12-08 Thread GitBox
hudi-bot commented on PR #7413: URL: https://github.com/apache/hudi/pull/7413#issuecomment-1343492157 ## CI report: * e78709ef1f7cc35893b09c8e9ade5533d59a77cf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7370: [HUDI-5346][HUDI-5320] Fixing Create Table as Select (CTAS) performance gaps

2022-12-08 Thread GitBox
hudi-bot commented on PR #7370: URL: https://github.com/apache/hudi/pull/7370#issuecomment-1343491930 ## CI report: * ac0714640c13e6538e2b553e2da97665217ea31f UNKNOWN * 5216f66efded3dc7e44a54951460fd18a112cd0e Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #7327: [HUDI-3088] Use Spark 3.2 as default Spark version

2022-12-08 Thread GitBox
hudi-bot commented on PR #7327: URL: https://github.com/apache/hudi/pull/7327#issuecomment-1343484228 ## CI report: * e5be418a0979e12026ff9b55b52a4d524e96e37a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1355

[jira] [Commented] (HUDI-5174) Clustering w/ two multi-writers could lead to issues

2022-12-08 Thread Hao Xie (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17645024#comment-17645024 ] Hao Xie commented on HUDI-5174: --- Hi do we have any updates on this issue? > Clustering w/ t

[GitHub] [hudi] haoxie-aws commented on pull request #6793: [HUDI-4917] Optimized the way to get HoodieBaseFile of loadColumnRangesFromFiles of Bloom Index

2022-12-08 Thread GitBox
haoxie-aws commented on PR #6793: URL: https://github.com/apache/hudi/pull/6793#issuecomment-1343479028 Hi can we prioritize this change and make sure it can be included in 0.12.2 release? Our production environment is suffering from the issue and has super high CPU usage. This change makes

[jira] [Updated] (HUDI-5321) Fix Bulk Insert ColumnSortPartitioners

2022-12-08 Thread Jonathan Vexler (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Vexler updated HUDI-5321: -- Status: In Progress (was: Open) > Fix Bulk Insert ColumnSortPartitioners >

[GitHub] [hudi] jonvex commented on pull request #7413: [HUDI-5321] correctly implement arePartitionRecordsSorted for bulk insert ColumnSortPartitioners

2022-12-08 Thread GitBox
jonvex commented on PR #7413: URL: https://github.com/apache/hudi/pull/7413#issuecomment-1343463777 Not sure what to do with [this test](https://github.com/apache/hudi/blob/926794aa74b71c0748acadb3fe6465dfd77446d6/hudi-client/hudi-java-client/src/test/java/org/apache/hudi/execution/bulkinser

[jira] [Updated] (HUDI-5321) Fix Bulk Insert ColumnSortPartitioners

2022-12-08 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5321: - Labels: pull-request-available (was: ) > Fix Bulk Insert ColumnSortPartitioners > ---

[GitHub] [hudi] jonvex opened a new pull request, #7413: [HUDI-5321] correctly implement arePartitionRecordsSorted for bulk insert ColumnSortPartitioners

2022-12-08 Thread GitBox
jonvex opened a new pull request, #7413: URL: https://github.com/apache/hudi/pull/7413 ### Change Logs Currently, all of the Custom Bulk Insert ColumnSortPartitioner impls incorrectly return "true" from the "arePartitionRecordsSorted" method, even though records might not necessarily

[hudi] branch master updated (da9fef6046 -> 2da69d35d6)

2022-12-08 Thread akudinkin
This is an automated email from the ASF dual-hosted git repository. akudinkin pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from da9fef6046 [HUDI-5345] Avoid fs.exists calls for metadata table in HFileBootstrapIndex (#7404) add 2da69d35d6 [

[GitHub] [hudi] alexeykudinkin merged pull request #7349: [HUDI-5291] Fixing NPE in MOR column stats accounting

2022-12-08 Thread GitBox
alexeykudinkin merged PR #7349: URL: https://github.com/apache/hudi/pull/7349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.a

[jira] [Updated] (HUDI-5354) Troubleshoot `testMetadataColumnStatsIndexPartialProjection` flakiness

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5354: -- Priority: Minor (was: Major) > Troubleshoot `testMetadataColumnStatsIndexPartialProjection` fla

[GitHub] [hudi] alexeykudinkin commented on pull request #7349: [HUDI-5291] Fixing NPE in MOR column stats accounting

2022-12-08 Thread GitBox
alexeykudinkin commented on PR #7349: URL: https://github.com/apache/hudi/pull/7349#issuecomment-1343438154 @codope i'm going to merge this one and follow-up on flakiness separately. My hunch is that it might be related to test isolation not being watertight. https://issues.apache.or

[jira] [Created] (HUDI-5354) Troubleshoot `testMetadataColumnStatsIndexPartialProjection` flakiness

2022-12-08 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-5354: - Summary: Troubleshoot `testMetadataColumnStatsIndexPartialProjection` flakiness Key: HUDI-5354 URL: https://issues.apache.org/jira/browse/HUDI-5354 Project: Apache

[GitHub] [hudi] hudi-bot commented on pull request #7412: [HUDI-5353] Close file readers

2022-12-08 Thread GitBox
hudi-bot commented on PR #7412: URL: https://github.com/apache/hudi/pull/7412#issuecomment-1343425599 ## CI report: * 48a4cfb270260b38470a4a81198fd788268d1fff Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1355

[jira] [Updated] (HUDI-2754) Performance improvement for IncrementalRelation

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-2754: -- Fix Version/s: 0.13.0 (was: 0.12.2) > Performance improvement for Increme

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4613: -- Priority: Blocker (was: Critical) > Avoid the use of regex expressions when call hoodieFileGrou

[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-3636: -- Priority: Blocker (was: Critical) > Clustering fails due to marker creation failure > -

[jira] [Assigned] (HUDI-5177) Revisit HiveIncrPullSource and JdbcSource for interleaved inflight commits

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5177: - Assignee: Jonathan Vexler > Revisit HiveIncrPullSource and JdbcSource for interleaved inf

[jira] [Commented] (HUDI-5169) Re-attempt failed rollback (regular commits, clustering) and get it to completion

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644973#comment-17644973 ] Alexey Kudinkin commented on HUDI-5169: --- [~shivnarayan] can you elaborate on the cas

[jira] [Updated] (HUDI-5169) Re-attempt failed rollback (regular commits, clustering) and get it to completion

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5169: -- Fix Version/s: 0.13.0 (was: 0.12.2) > Re-attempt failed rollback (regular

[jira] [Closed] (HUDI-4526) improve spillableMapBasePath disk directory is full

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-4526. - Resolution: Fixed > improve spillableMapBasePath disk directory is full >

[jira] [Commented] (HUDI-4954) Shade avro in all bundles where it is included

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17644969#comment-17644969 ] Alexey Kudinkin commented on HUDI-4954: --- This seems to be a risky change to go in a

[jira] [Updated] (HUDI-4954) Shade avro in all bundles where it is included

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4954: -- Fix Version/s: 0.13.0 (was: 0.12.2) > Shade avro in all bundles where it

[jira] [Assigned] (HUDI-4954) Shade avro in all bundles where it is included

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-4954: - Assignee: Alexey Kudinkin (was: Sagar Sumit) > Shade avro in all bundles where it is inc

[jira] [Assigned] (HUDI-4954) Shade avro in all bundles where it is included

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-4954: - Assignee: Sagar Sumit (was: Alexey Kudinkin) > Shade avro in all bundles where it is inc

[jira] [Assigned] (HUDI-5261) Use proper parallelism for engine context APIs

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5261: - Assignee: Jonathan Vexler > Use proper parallelism for engine context APIs >

[jira] [Updated] (HUDI-5080) UnpersistRdds unpersist all rdds in the spark context

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5080: -- Fix Version/s: 0.13.0 (was: 0.12.2) > UnpersistRdds unpersist all rdds in

[jira] [Updated] (HUDI-5079) Optimize rdd.isEmpty within DeltaSync

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5079: -- Fix Version/s: 0.13.0 (was: 0.12.2) > Optimize rdd.isEmpty within DeltaSy

[jira] [Updated] (HUDI-5097) Read 0 records from partitioned table without partition fields in table configs

2022-12-08 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5097: -- Priority: Blocker (was: Critical) > Read 0 records from partitioned table without partition fie

  1   2   3   >