[GitHub] [hudi] yihua commented on a diff in pull request #8125: [HUDI-5900] Clean up unused metadata configs

2023-03-30 Thread via GitHub
yihua commented on code in PR #8125: URL: https://github.com/apache/hudi/pull/8125#discussion_r1136065113 ## hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/client/functional/TestHoodieBackedMetadata.java: ## @@ -556,23 +554,10 @@ public void testUpdationOfPopulateM

[GitHub] [hudi] yihua commented on pull request #8238: [HUDI-5954] Infer cleaning policy based on clean configs

2023-03-30 Thread via GitHub
yihua commented on PR #8238: URL: https://github.com/apache/hudi/pull/8238#issuecomment-1491404273 > I'm so confused by these options, does the option hoodie.cleaner.policy make any sense here? If all the specific cleaning param: hoodie.cleaner.commits.retained, hoodie.cleaner.hours.retaine

[GitHub] [hudi] weimingdiit commented on a diff in pull request #8301: [HUDI-5988] Add a param, Implement a full partition sync operation wh…

2023-03-30 Thread via GitHub
weimingdiit commented on code in PR #8301: URL: https://github.com/apache/hudi/pull/8301#discussion_r1154083727 ## hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/HoodieSyncConfig.java: ## @@ -163,6 +163,11 @@ public class HoodieSyncConfig extends HoodieConf

[GitHub] [hudi] hudi-bot commented on pull request #8336: Remove unnecessary scala-maven-plugin

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8336: URL: https://github.com/apache/hudi/pull/8336#issuecomment-1491398306 ## CI report: * 53eb3b2baec36a5d61e9b48f40e1e76aa6e6ea82 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1601

[GitHub] [hudi] hudi-bot commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491398178 ## CI report: * 77916c48361ac95d6fb4fafe01b91ff8eea87b07 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1599

[GitHub] [hudi] hudi-bot commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491392068 ## CI report: * 77916c48361ac95d6fb4fafe01b91ff8eea87b07 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1599

[GitHub] [hudi] hudi-bot commented on pull request #8336: Remove unnecessary scala-maven-plugin

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8336: URL: https://github.com/apache/hudi/pull/8336#issuecomment-1491392187 ## CI report: * 53eb3b2baec36a5d61e9b48f40e1e76aa6e6ea82 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491386213 ## CI report: * 77916c48361ac95d6fb4fafe01b91ff8eea87b07 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1599

[GitHub] [hudi] hudi-bot commented on pull request #8198: [HUDI-5943] Support bootstrap produce to synchronize to multiple metastores

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8198: URL: https://github.com/apache/hudi/pull/8198#issuecomment-1491385980 ## CI report: * 2f3468c00a766cfb9a5fdb641fb98114aa572e99 UNKNOWN * f7baf9850d51823429701dc0c198730c108b8c6c Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] bvaradar commented on pull request #7279: Test defaults

2023-03-30 Thread via GitHub
bvaradar commented on PR #7279: URL: https://github.com/apache/hudi/pull/7279#issuecomment-1491385274 @the-other-tim-brown : Can you add a jira id and the description of the purpose of PR. Also, can you create a new avsc file for your test -- This is an automated message from the Apache

[GitHub] [hudi] watermelon12138 commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
watermelon12138 commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491359307 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [hudi] CTTY opened a new pull request, #8336: Remove unnecessary scala-maven-plugin

2023-03-30 Thread via GitHub
CTTY opened a new pull request, #8336: URL: https://github.com/apache/hudi/pull/8336 `hudi-client-common` and `hudi-flink-client` don't have any scala files yet have `scala-maven-plugin` specified in their pom files. It would fail when building `hudi-trino-bundle` with JDK 17 due the the fa

[GitHub] [hudi] lqbFFF commented on issue #8330: [SUPPORT]RT tables records not same when using hive query and sparksql query

2023-03-30 Thread via GitHub
lqbFFF commented on issue #8330: URL: https://github.com/apache/hudi/issues/8330#issuecomment-1491348469 > This is a known issue, take this doc for reference: https://www.yuque.com/yuzhao-my9fz/kb/kgv2rb?#%20%E3%80%8AHive%20On%20Hudi%E3%80%8B thanks。I will check doc steps and try agai

[GitHub] [hudi] DavidZ1 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-03-30 Thread via GitHub
DavidZ1 commented on issue #8267: URL: https://github.com/apache/hudi/issues/8267#issuecomment-1491339024 No, we have stopped the job. The logs file cleanup logic is enabled by default, as seen from the DAG diagram. -- This is an automated message from the Apache Git Service.

[GitHub] [hudi] hudi-bot commented on pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8335: URL: https://github.com/apache/hudi/pull/8335#issuecomment-1491327267 ## CI report: * 325fc7349dfdf20b4633f462156255e1276bf495 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[GitHub] [hudi] hudi-bot commented on pull request #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8335: URL: https://github.com/apache/hudi/pull/8335#issuecomment-1491322630 ## CI report: * 325fc7349dfdf20b4633f462156255e1276bf495 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-6009) Let the jetty server in TimelineService create daemon threads

2023-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6009: - Labels: pull-request-available (was: ) > Let the jetty server in TimelineService create daemon th

[GitHub] [hudi] cxzl25 opened a new pull request, #8335: [HUDI-6009] Let the jetty server in TimelineService create daemon threads

2023-03-30 Thread via GitHub
cxzl25 opened a new pull request, #8335: URL: https://github.com/apache/hudi/pull/8335 ### Change Logs Let the jetty server in TimelineService create daemon threads ### Impact When hudi is integrated with spark, sometimes the spark driver cannot exit normally, because th

[jira] [Created] (HUDI-6009) Let the jetty server in TimelineService create daemon threads

2023-03-30 Thread dzcxzl (Jira)
dzcxzl created HUDI-6009: Summary: Let the jetty server in TimelineService create daemon threads Key: HUDI-6009 URL: https://issues.apache.org/jira/browse/HUDI-6009 Project: Apache Hudi Issue Type:

[GitHub] [hudi] lvyanquan opened a new pull request, #8334: [MINOR][DOCS] Remove preCombineField which is not in table

2023-03-30 Thread via GitHub
lvyanquan opened a new pull request, #8334: URL: https://github.com/apache/hudi/pull/8334 ### Change Logs `ts` is not a column of `hudi_table_p0`, and don't need to set preCombineField for cow table. error message: ``` java.lang.IllegalArgumentException: Can't find preCom

[GitHub] [hudi] hudi-bot commented on pull request #8333: [MINOR] Remove unnecessary KryoSerializable interface in HoodieSparkRecord class signature

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8333: URL: https://github.com/apache/hudi/pull/8333#issuecomment-1491280565 ## CI report: * 3819052d07b2ffdc052b0583b08a5c8754dd67ad Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[GitHub] [hudi] hudi-bot commented on pull request #8333: [MINOR] Remove unnecessary KryoSerializable interface in HoodieSparkRecord class signature

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8333: URL: https://github.com/apache/hudi/pull/8333#issuecomment-1491276093 ## CI report: * 3819052d07b2ffdc052b0583b08a5c8754dd67ad UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #8179: [HUDI-5932] Make the combine step in Call run_bootstrap Procedure optional

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8179: URL: https://github.com/apache/hudi/pull/8179#issuecomment-1491275667 ## CI report: * 0e5ea037ffd066a607af54b7af673db562d39b7b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1579

[GitHub] [hudi] HuangFru commented on issue #8332: [SUPPORT] Spark insert overwrite in partition table causes executors OOM.

2023-03-30 Thread via GitHub
HuangFru commented on issue #8332: URL: https://github.com/apache/hudi/issues/8332#issuecomment-1491273624 > Yes, it seems we still use the `UPSERT` code path for the `INSERT OVERWRITE TABLE` operation, should be optimized to `INSERT` if possible. Any suggestion for me to carry on thi

[GitHub] [hudi] HuangFru commented on issue #8332: [SUPPORT] Spark insert overwrite in partition table causes executors OOM.

2023-03-30 Thread via GitHub
HuangFru commented on issue #8332: URL: https://github.com/apache/hudi/issues/8332#issuecomment-1491270176 > @HuangFru Did you tried by increasing executor memory? executor--cores you can still keep 5. I'm doing a performance test so I must control variables, I've tried to decrease the n

[GitHub] [hudi] danny0405 commented on a diff in pull request #7955: [HUDI-5649] Unify all the loggers to slf4j

2023-03-30 Thread via GitHub
danny0405 commented on code in PR #7955: URL: https://github.com/apache/hudi/pull/7955#discussion_r1153994131 ## packaging/hudi-cli-bundle/pom.xml: ## @@ -298,5 +299,15 @@ log4j-core compile + + org.slf4j + slf4j-api + compile + + +

[GitHub] [hudi] ad1happy2go commented on issue #8331: [SUPPORT] When using the HoodieDeltaStreamer, is there a corresponding parameter that can control the number of cycles? For example, if I cycle 5

2023-03-30 Thread via GitHub
ad1happy2go commented on issue #8331: URL: https://github.com/apache/hudi/issues/8331#issuecomment-1491263575 @LiJie20190102 Can you let use know the complete spark-submit command you are using. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] danny0405 commented on issue #8332: [SUPPORT] Spark insert overwrite in partition table causes executors OOM.

2023-03-30 Thread via GitHub
danny0405 commented on issue #8332: URL: https://github.com/apache/hudi/issues/8332#issuecomment-1491262859 Yes, it seems we still use the `UPSERT` code path for the `INSERT OVERWRITE TABLE` operation, should be optimized to `INSERT` if possible. -- This is an automated message from the A

[GitHub] [hudi] ad1happy2go commented on issue #8332: [SUPPORT] Spark insert overwrite in partition table causes executors OOM.

2023-03-30 Thread via GitHub
ad1happy2go commented on issue #8332: URL: https://github.com/apache/hudi/issues/8332#issuecomment-1491262656 @HuangFru Did you tried by increasing executor memory? executor--cores you can still keep 5. -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [hudi] danny0405 commented on pull request #8245: [HUDI-5944] Added the ability to fix partitiion missing in hudi synctool

2023-03-30 Thread via GitHub
danny0405 commented on PR #8245: URL: https://github.com/apache/hudi/pull/8245#issuecomment-1491257506 > > > Hi, > > > We got the same issue before when syncing Hive partition if there are more then one writer. And we fix this issue by using @boneanxs 's solution in this PR: #7627 > >

[GitHub] [hudi] danny0405 commented on issue #8330: [SUPPORT]RT tables records not same when using hive query and sparksql query

2023-03-30 Thread via GitHub
danny0405 commented on issue #8330: URL: https://github.com/apache/hudi/issues/8330#issuecomment-1491249704 This is a known issue, take this doc for reference: https://www.yuque.com/yuzhao-my9fz/kb/kgv2rb?#%20%E3%80%8AHive%20On%20Hudi%E3%80%8B -- This is an automated message from the Apac

[GitHub] [hudi] wecharyu opened a new pull request, #8333: [MINOR] Remove unnecessary KryoSerializable interface in HoodieSparkRecord class signature

2023-03-30 Thread via GitHub
wecharyu opened a new pull request, #8333: URL: https://github.com/apache/hudi/pull/8333 ### Change Logs `HoodieRecord` implement `KryoSerializable` interface, so we do not need to implement it again in `HoodieSparkRecord` class. https://github.com/apache/hudi/blob/4b995a8c5d36c08

[GitHub] [hudi] hudi-bot commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491246786 ## CI report: * 77916c48361ac95d6fb4fafe01b91ff8eea87b07 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1599

[GitHub] [hudi] hudi-bot commented on pull request #8198: [HUDI-5943] Support bootstrap produce to synchronize to multiple metastores

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8198: URL: https://github.com/apache/hudi/pull/8198#issuecomment-1491246654 ## CI report: * 2f3468c00a766cfb9a5fdb641fb98114aa572e99 UNKNOWN * b39823121a761223344edbb9ad8999e35917b3cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] danny0405 commented on a diff in pull request #8107: [HUDI-5514] Adding auto generation of record keys support to Hudi/Spark

2023-03-30 Thread via GitHub
danny0405 commented on code in PR #8107: URL: https://github.com/apache/hudi/pull/8107#discussion_r1153980994 ## hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java: ## @@ -260,6 +260,18 @@ public class HoodieTableConfig extends HoodieConfig { .s

[GitHub] [hudi] boneanxs commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-03-30 Thread via GitHub
boneanxs commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1153979409 ## hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc: ## @@ -128,6 +128,11 @@ "HoodieIndexCommitMetadata" ], "default": null +

[GitHub] [hudi] danny0405 commented on a diff in pull request #8107: [HUDI-5514] Adding auto generation of record keys support to Hudi/Spark

2023-03-30 Thread via GitHub
danny0405 commented on code in PR #8107: URL: https://github.com/apache/hudi/pull/8107#discussion_r1153979756 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java: ## @@ -1145,6 +1145,10 @@ public String getKeyGeneratorClass() { retu

[GitHub] [hudi] boneanxs commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-03-30 Thread via GitHub
boneanxs commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1153979409 ## hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc: ## @@ -128,6 +128,11 @@ "HoodieIndexCommitMetadata" ], "default": null +

[GitHub] [hudi] boneanxs commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-03-30 Thread via GitHub
boneanxs commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1153979409 ## hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc: ## @@ -128,6 +128,11 @@ "HoodieIndexCommitMetadata" ], "default": null +

[GitHub] [hudi] danny0405 commented on issue #8267: [SUPPORT] Why some delta commit logs files are not converted to parquet ?

2023-03-30 Thread via GitHub
danny0405 commented on issue #8267: URL: https://github.com/apache/hudi/issues/8267#issuecomment-1491241122 Thanks, is the job still running ? Did you enable the incremental cleaning yet? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] hudi-bot commented on pull request #8176: [HUDI-5929] Automatically infer key generator type

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8176: URL: https://github.com/apache/hudi/pull/8176#issuecomment-1491240123 ## CI report: * 8ac30b3eed9452b6c7ed2715748dd9866770961f Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[GitHub] [hudi] danny0405 commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-03-30 Thread via GitHub
danny0405 commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1153976344 ## hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc: ## @@ -128,6 +128,11 @@ "HoodieIndexCommitMetadata" ], "default": null +

[GitHub] [hudi] weimingdiit commented on pull request #8245: [HUDI-5944] Added the ability to fix partitiion missing in hudi synctool

2023-03-30 Thread via GitHub
weimingdiit commented on PR #8245: URL: https://github.com/apache/hudi/pull/8245#issuecomment-1491239195 > > Hi, > > We got the same issue before when syncing Hive partition if there are more then one writer. And we fix this issue by using @boneanxs 's solution in this PR: #7627 > > F

[jira] [Closed] (HUDI-6005) Auto generate client id for Flink multi writer

2023-03-30 Thread Danny Chen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6005. Resolution: Fixed Fixed via master branch: 4b995a8c5d36c08744f08f218ddab84b1c6317bd > Auto generate client

[hudi] branch master updated: [HUDI-6005] Auto generate client id for Flink multi writer (#8323)

2023-03-30 Thread danny0405
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 4b995a8c5d3 [HUDI-6005] Auto generate client id

[GitHub] [hudi] danny0405 merged pull request #8323: [HUDI-6005] Auto generate client id for Flink multi writer

2023-03-30 Thread via GitHub
danny0405 merged PR #8323: URL: https://github.com/apache/hudi/pull/8323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache

[GitHub] [hudi] danny0405 commented on a diff in pull request #8238: [HUDI-5954] Infer cleaning policy based on clean configs

2023-03-30 Thread via GitHub
danny0405 commented on code in PR #8238: URL: https://github.com/apache/hudi/pull/8238#discussion_r1153970726 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java: ## @@ -59,25 +63,67 @@ public class HoodieCleanConfig extends HoodieConfig

[jira] [Updated] (HUDI-6007) When using the MOR table with flink, hudi savepoint may be invalid which lead to a consistency issues

2023-03-30 Thread zouxxyy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zouxxyy updated HUDI-6007: -- Summary: When using the MOR table with flink, hudi savepoint may be invalid which lead to a consistency issues

[jira] [Updated] (HUDI-6007) When using the MOR table with flink, hudi savepoint may be invalid which lead to consistency issue

2023-03-30 Thread zouxxyy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zouxxyy updated HUDI-6007: -- Description: Currently hudi's savepoint only saves the base file, and filter the files in it when clean. But w

[jira] [Updated] (HUDI-6007) When using the MOR table with flink, hudi savepoint may be invalid which lead to consistency issue

2023-03-30 Thread zouxxyy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zouxxyy updated HUDI-6007: -- Summary: When using the MOR table with flink, hudi savepoint may be invalid which lead to consistency issue (wa

[jira] [Updated] (HUDI-6007) When using the MOR table with flink, hudi savepoint may be invalid

2023-03-30 Thread zouxxyy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zouxxyy updated HUDI-6007: -- Summary: When using the MOR table with flink, hudi savepoint may be invalid (was: When using the MOR table with

[jira] [Updated] (HUDI-6007) When using the MOR table with flink, hudi savepoint is invalid

2023-03-30 Thread zouxxyy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zouxxyy updated HUDI-6007: -- Description: Currently hudi's savepoint only saves the base file, and filter the files in it when clean. But w

[jira] [Commented] (HUDI-6007) When using the MOR table with flink, hudi savepoint is invalid

2023-03-30 Thread zhihao song (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707090#comment-17707090 ] zhihao song commented on HUDI-6007: --- I think savepoint is not completely invalid when us

[GitHub] [hudi] HuangFru opened a new issue, #8332: [SUPPORT] Spark insert overwrite causes executors OOM.

2023-03-30 Thread via GitHub
HuangFru opened a new issue, #8332: URL: https://github.com/apache/hudi/issues/8332 **Describe the problem you faced** I'm doing a simple write performance test for Hudi in Spark on Yarn, but my executors will be dead for OOM. And the 'insert overwrite' SQL could be very slow. I'v

[GitHub] [hudi] hudi-bot commented on pull request #8198: [HUDI-5943] Support bootstrap produce to synchronize to multiple metastores

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8198: URL: https://github.com/apache/hudi/pull/8198#issuecomment-1491217313 ## CI report: * 2f3468c00a766cfb9a5fdb641fb98114aa572e99 UNKNOWN * b39823121a761223344edbb9ad8999e35917b3cb Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] hudi-bot commented on pull request #8179: [HUDI-5932] Make the combine step in Call run_bootstrap Procedure optional

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8179: URL: https://github.com/apache/hudi/pull/8179#issuecomment-1491217277 ## CI report: * 0e5ea037ffd066a607af54b7af673db562d39b7b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1579

[GitHub] [hudi] boundarymate commented on issue #8314: [SUPPORT] Why not save log files with timestamp less than savepoint's instant time?

2023-03-30 Thread via GitHub
boundarymate commented on issue #8314: URL: https://github.com/apache/hudi/issues/8314#issuecomment-1491216309 > Yes, it is a bug, especially when using the MOR table with Flink, hudi savepoint is actually invalid. I created a [JIRA](https://issues.apache.org/jira/browse/HUDI-6007) and will

[hudi] branch master updated (cb1395a820f -> 09f5d4f583b)

2023-03-30 Thread codope
This is an automated email from the ASF dual-hosted git repository. codope pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from cb1395a820f [HUDI-5893] Mark advanced configs (#8295) add 09f5d4f583b [HUDI-5907] Allow skip saving checkpoint in d

[GitHub] [hudi] codope merged pull request #8137: [HUDI-5907] Allow skip saving checkpoint in deltastreamer

2023-03-30 Thread via GitHub
codope merged PR #8137: URL: https://github.com/apache/hudi/pull/8137 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.or

[GitHub] [hudi] nsivabalan commented on a diff in pull request #8128: [HUDI-5782] Tweak defaults and remove unnecessary configs after config review

2023-03-30 Thread via GitHub
nsivabalan commented on code in PR #8128: URL: https://github.com/apache/hudi/pull/8128#discussion_r1153954231 ## hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java: ## @@ -46,7 +46,7 @@ public class DynamoDbBasedLockConfig extends HoodieConfig { pub

[GitHub] [hudi] jiangxinqi1995 commented on issue #8276: [SUPPORT] Flink Exceeded checkpoint tolerable failure threshold.

2023-03-30 Thread via GitHub
jiangxinqi1995 commented on issue #8276: URL: https://github.com/apache/hudi/issues/8276#issuecomment-1491196499 I did a test, and there was a 10 minute interval between savepoint and checkpoint. After triggering savepoint, the checkpoint still failed because the resources were sufficient a

[GitHub] [hudi] boneanxs commented on pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-03-30 Thread via GitHub
boneanxs commented on PR #7627: URL: https://github.com/apache/hudi/pull/7627#issuecomment-1491187084 > +1 on this in general. but would this be a format change? This will have backwards compatibility issues. > there is not storage change (barring archived entry which needs to be dis

[GitHub] [hudi] LegendOfGod commented on issue #8330: [SUPPORT]RT tables records not same when using hive query and sparksql query

2023-03-30 Thread via GitHub
LegendOfGod commented on issue #8330: URL: https://github.com/apache/hudi/issues/8330#issuecomment-1491185921 rt max( _hoodie_commit_time): sparksql: 20230330160404 hive:20230330160403 ![企业微信截图_16802282508009](https://user-images.githubusercontent.com/69707897/229004317-7153734

[GitHub] [hudi] LegendOfGod commented on issue #8330: [SUPPORT]RT tables records not same when using hive query and sparksql query

2023-03-30 Thread via GitHub
LegendOfGod commented on issue #8330: URL: https://github.com/apache/hudi/issues/8330#issuecomment-1491181177 sparksql-ro:10800 records ![sparksql-ro](https://user-images.githubusercontent.com/69707897/229003299-ba62f638-46fb-45e9-9df2-07e3be129eff.png) sparksql-rt:10900 records ![

[GitHub] [hudi] LiJie20190102 opened a new issue, #8331: [SUPPORT] When using the HoodieDeltaStreamer, is there a corresponding parameter that can control the number of cycles? For example, if I cycle

2023-03-30 Thread via GitHub
LiJie20190102 opened a new issue, #8331: URL: https://github.com/apache/hudi/issues/8331 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-su

[GitHub] [hudi] LegendOfGod opened a new issue, #8330: [SUPPORT]RT tables records not same when using hive query and sparksql query

2023-03-30 Thread via GitHub
LegendOfGod opened a new issue, #8330: URL: https://github.com/apache/hudi/issues/8330 **Describe the problem you faced** I am using flinkcdc to sync data and HMS sync to hive as the same time; When query RT table using sparksql,the result returns as expected; When query RT tab

[GitHub] [hudi] hudi-bot commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491166397 ## CI report: * 77916c48361ac95d6fb4fafe01b91ff8eea87b07 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1599

[GitHub] [hudi] watermelon12138 commented on pull request #8308: [HUDI-5994] Bucket index supports bulk insert mode.

2023-03-30 Thread via GitHub
watermelon12138 commented on PR #8308: URL: https://github.com/apache/hudi/pull/8308#issuecomment-1491165337 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [hudi] hudi-bot commented on pull request #8176: [HUDI-5929] Automatically infer key generator type

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8176: URL: https://github.com/apache/hudi/pull/8176#issuecomment-1491140798 ## CI report: * 280559f7b8dc7d1738a7e251b1ad4c42658a48b8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1575

[GitHub] [hudi] hudi-bot commented on pull request #8176: [HUDI-5929] Automatically infer key generator type

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8176: URL: https://github.com/apache/hudi/pull/8176#issuecomment-1491137422 ## CI report: * 280559f7b8dc7d1738a7e251b1ad4c42658a48b8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1575

[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1491133999 ## CI report: * c16bb4766766484aa23824b7fa2cd363fd8a9f69 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[jira] [Updated] (HUDI-6008) Update docs on key generator

2023-03-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6008: Description: Docs update for HUDI-5929. We should fix our quick start to start using key gen type instead o

[GitHub] [hudi] yihua commented on pull request #8176: [HUDI-5929] Automatically infer key generator type

2023-03-30 Thread via GitHub
yihua commented on PR #8176: URL: https://github.com/apache/hudi/pull/8176#issuecomment-1491133351 > Also, do you think we should fix our quick start to start using key gen type instead of class name. Also, we might also need to add docs around this auto inference and clarify that users don

[jira] [Updated] (HUDI-6008) Update docs on key generator

2023-03-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6008: Fix Version/s: 0.14.0 > Update docs on key generator > > > Key:

[jira] [Updated] (HUDI-6008) Update docs on key generator

2023-03-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-6008: Summary: Update docs on key generator (was: Update docs for key generator) > Update docs on key generator >

[jira] [Assigned] (HUDI-6008) Update docs on key generator

2023-03-30 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo reassigned HUDI-6008: --- Assignee: Ethan Guo > Update docs on key generator > > >

[jira] [Created] (HUDI-6008) Update docs for key generator

2023-03-30 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-6008: --- Summary: Update docs for key generator Key: HUDI-6008 URL: https://issues.apache.org/jira/browse/HUDI-6008 Project: Apache Hudi Issue Type: New Feature Rep

[GitHub] [hudi] yihua commented on a diff in pull request #8176: [HUDI-5929] Automatically infer key generator type

2023-03-30 Thread via GitHub
yihua commented on code in PR #8176: URL: https://github.com/apache/hudi/pull/8176#discussion_r1153905604 ## hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/factory/HoodieSparkKeyGeneratorFactory.java: ## @@ -75,40 +79,60 @@ public static KeyGenerator createK

[GitHub] [hudi] voonhous commented on pull request #8298: [HUDI-5989] Fix date conversion issue when performing partition pruning on Spark

2023-03-30 Thread via GitHub
voonhous commented on PR #8298: URL: https://github.com/apache/hudi/pull/8298#issuecomment-1491130942 @codope CI is green, can you please help to review this, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [hudi] yihua closed pull request #8250: [HUDI-5780]

2023-03-30 Thread via GitHub
yihua closed pull request #8250: [HUDI-5780] URL: https://github.com/apache/hudi/pull/8250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsub

[GitHub] [hudi] yihua commented on pull request #8250: [HUDI-5780]

2023-03-30 Thread via GitHub
yihua commented on PR #8250: URL: https://github.com/apache/hudi/pull/8250#issuecomment-1491106926 Closing this in favor of #8184 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [hudi] yihua commented on pull request #8324: [Typo][Hoodie Metadata]Fix a typo for parameter in HoodieMetadata.

2023-03-30 Thread via GitHub
yihua commented on PR #8324: URL: https://github.com/apache/hudi/pull/8324#issuecomment-1491105231 Closing this as the spelling is correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] yihua closed pull request #8324: [Typo][Hoodie Metadata]Fix a typo for parameter in HoodieMetadata.

2023-03-30 Thread via GitHub
yihua closed pull request #8324: [Typo][Hoodie Metadata]Fix a typo for parameter in HoodieMetadata. URL: https://github.com/apache/hudi/pull/8324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] hudi-bot commented on pull request #8328: [HUDI-6002] Add JsonSchemaKafkaSource to handle json schema payload

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8328: URL: https://github.com/apache/hudi/pull/8328#issuecomment-1491091901 ## CI report: * 86d7b7fdf3cac5da38f7429a8a569aa3cfc61529 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[GitHub] [hudi] soumilshah1995 commented on issue #8309: [SUPPORT] Need Assistance with Hudi Delta Streamer for Community Video

2023-03-30 Thread via GitHub
soumilshah1995 commented on issue #8309: URL: https://github.com/apache/hudi/issues/8309#issuecomment-1491069489 You are right I was missing that ) On Thu, Mar 30, 2023 at 6:53 PM Y Ethan Guo ***@***.***> wrote: > @soumilshah1995 Just got t

[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1491058765 ## CI report: * c16bb4766766484aa23824b7fa2cd363fd8a9f69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[GitHub] [hudi] hudi-bot commented on pull request #8329: [HUDI-5893] Mark additional advanced configs

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8329: URL: https://github.com/apache/hudi/pull/8329#issuecomment-1491053852 ## CI report: * c16bb4766766484aa23824b7fa2cd363fd8a9f69 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-3088) Make Spark 3 the default profile for build and test

2023-03-30 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3088: - Reviewers: (was: Raymond Xu, Yann Byron) > Make Spark 3 the default profile for build and test > ---

[jira] [Assigned] (HUDI-3088) Make Spark 3 the default profile for build and test

2023-03-30 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu reassigned HUDI-3088: Assignee: Raymond Xu (was: Rahil Chertara) > Make Spark 3 the default profile for build and test >

[GitHub] [hudi] soumilshah1995 commented on issue #8309: [SUPPORT] Need Assistance with Hudi Delta Streamer for Community Video

2023-03-30 Thread via GitHub
soumilshah1995 commented on issue #8309: URL: https://github.com/apache/hudi/issues/8309#issuecomment-1491028265 These are config that worked for me ``` spark-submit \ --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ --conf

[GitHub] [hudi] soumilshah1995 commented on issue #8309: [SUPPORT] Need Assistance with Hudi Delta Streamer for Community Video

2023-03-30 Thread via GitHub
soumilshah1995 commented on issue #8309: URL: https://github.com/apache/hudi/issues/8309#issuecomment-1491027553 > Do you mind sharing what was the issue? @soumilshah1995 umm sure i think i was missing some config -- This is an automated message from the Apache Git Service. To resp

[GitHub] [hudi] yihua opened a new pull request, #8329: [HUDI-5893] Mark additional advanced configs

2023-03-30 Thread via GitHub
yihua opened a new pull request, #8329: URL: https://github.com/apache/hudi/pull/8329 ### Change Logs This PR marks additional advanced configs. ### Impact Advanced configs are not shown in the "Basic Configuration" page in our docs for simplicity. ### Risk level

[GitHub] [hudi] xushiyan commented on pull request #6117: Use Spark 3.2 as default Spark version, (older rebase)

2023-03-30 Thread via GitHub
xushiyan commented on PR #6117: URL: https://github.com/apache/hudi/pull/6117#issuecomment-1491005811 the last status of this work is done in https://github.com/apache/hudi/pull/7327 i'll close this one in favor of that -- This is an automated message from the Apache Git Service. T

[GitHub] [hudi] xushiyan closed pull request #6117: Use Spark 3.2 as default Spark version, (older rebase)

2023-03-30 Thread via GitHub
xushiyan closed pull request #6117: Use Spark 3.2 as default Spark version, (older rebase) URL: https://github.com/apache/hudi/pull/6117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] xushiyan closed pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

2023-03-30 Thread via GitHub
xushiyan closed pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile URL: https://github.com/apache/hudi/pull/6151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] xushiyan commented on pull request #6151: [HUDI-4429] Make Spark 3.1 the default profile

2023-03-30 Thread via GitHub
xushiyan commented on PR #6151: URL: https://github.com/apache/hudi/pull/6151#issuecomment-1491005519 the last status of this work is done in https://github.com/apache/hudi/pull/7327 i'll close this one in favor of that -- This is an automated message from the Apache Git Service. T

[GitHub] [hudi] hudi-bot commented on pull request #8328: [HUDI-6002] Add JsonSchemaKafkaSource to handle json schema payload

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8328: URL: https://github.com/apache/hudi/pull/8328#issuecomment-1490931537 ## CI report: * 86d7b7fdf3cac5da38f7429a8a569aa3cfc61529 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1600

[GitHub] [hudi] vinothchandar commented on pull request #7955: [HUDI-5649] Unify all the loggers to slf4j

2023-03-30 Thread via GitHub
vinothchandar commented on PR #7955: URL: https://github.com/apache/hudi/pull/7955#issuecomment-1490926297 Thanks @kkrugler ! Looks like we are in shape w.r.t log4j property files. @danny0405 you can take it from here based on my last comment above -- This is an automated message from th

[jira] [Updated] (HUDI-6002) Handle JSON schema payload in JsonKafkaSource

2023-03-30 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-6002: - Labels: pull-request-available (was: ) > Handle JSON schema payload in JsonKafkaSource >

[GitHub] [hudi] hudi-bot commented on pull request #8328: [HUDI-6002] Add JsonSchemaKafkaSource to handle json schema payload

2023-03-30 Thread via GitHub
hudi-bot commented on PR #8328: URL: https://github.com/apache/hudi/pull/8328#issuecomment-1490923040 ## CI report: * 86d7b7fdf3cac5da38f7429a8a569aa3cfc61529 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

  1   2   3   >