[GitHub] [hudi] xushiyan commented on a change in pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-14 Thread GitBox
xushiyan commented on a change in pull request #2426: URL: https://github.com/apache/hudi/pull/2426#discussion_r554531429 ## File path: pom.xml ## @@ -198,34 +200,36 @@ -

[GitHub] [hudi] xushiyan removed a comment on pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-14 Thread GitBox
xushiyan removed a comment on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-757438915 ## TODO - [ ] Manually verify some diffs after spotless apply won't conflict with IDE formatter and checkstyle

[GitHub] [hudi] xushiyan commented on pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-14 Thread GitBox
xushiyan commented on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-760076561 @vinothchandar The style can be sync'ed by - using google-java-format in spotless config and `spotless:apply` enforces the style that is also compatible with existing checkstyl

[GitHub] [hudi] quitozang opened a new issue #2446: [SUPPORT] The parameter "hoodie.bloom.index.filter.type" does not take effect in deltaStreamer

2021-01-14 Thread GitBox
quitozang opened a new issue #2446: URL: https://github.com/apache/hudi/issues/2446 Why does this parameter "hoodie.bloom.index.filter.type" not take effect in deltaStreamer, the bloom filter type is always in SIMPLE. This i

[GitHub] [hudi] wangxianghu commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557021821 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant = ""

[GitHub] [hudi] wangxianghu commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557021821 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant = ""

[GitHub] [hudi] codecov-io edited a comment on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-745334158 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2334?src=pr&el=h1) Report > Merging [#2334](https://codecov.io/gh/apache/hudi/pull/2334?src=pr&el=desc) (bbb604a) in

[GitHub] [hudi] wangxianghu commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557021821 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant = ""

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557306549 ## File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java ## @@ -81,16 +103,50 @@ public static DFSPropertiesConfiguration readConfig(Fi

[jira] [Created] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-01-14 Thread teeyog (Jira)
teeyog created HUDI-1527: Summary: Automatically infer the data directory, users only need to specify the table directory Key: HUDI-1527 URL: https://issues.apache.org/jira/browse/HUDI-1527 Project: Apache Hu

[jira] [Updated] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-01-14 Thread teeyog (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] teeyog updated HUDI-1527: - Description: To read the hudi table, you need to specify the path, but the path is not only the tablePath corresp

[GitHub] [hudi] teeyog opened a new pull request #2447: [HUDI-1527] automatically infer the data directory, users only need t…

2021-01-14 Thread GitBox
teeyog opened a new pull request #2447: URL: https://github.com/apache/hudi/pull/2447 …o specify the table directory ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*

[jira] [Updated] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1527: - Labels: pull-request-available (was: ) > Automatically infer the data directory, users only need

[GitHub] [hudi] codecov-io edited a comment on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-745334158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557329799 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -79,6 +79,11 @@ public OverwriteWithLatestAvr

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557331716 ## File path: hudi-flink/pom.xml ## @@ -124,28 +124,77 @@ kafka-clients ${kafka.version} + + org.apache.flink + flink-hado

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557331716 ## File path: hudi-flink/pom.xml ## @@ -124,28 +124,77 @@ kafka-clients ${kafka.version} + + org.apache.flink + flink-hado

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557337056 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/HoodieOptions.java ## @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] [hudi] codecov-io commented on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-14 Thread GitBox
codecov-io commented on pull request #2443: URL: https://github.com/apache/hudi/pull/2443#issuecomment-760147630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2443?src=pr&el=h1) Report > Merging [#2443](https://codecov.io/gh/apache/hudi/pull/2443?src=pr&el=desc) (0d98db7) into [ma

[GitHub] [hudi] yanghua commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557347825 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant = "";

[jira] [Created] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
Trevorzhang created HUDI-1528: - Summary: hudi-sync-tool error Key: HUDI-1528 URL: https://issues.apache.org/jira/browse/HUDI-1528 Project: Apache Hudi Issue Type: Bug Components: Hive I

[jira] [Comment Edited] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264859#comment-17264859 ] Trevorzhang edited comment on HUDI-1528 at 1/14/21, 12:25 PM: --

[jira] [Commented] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264859#comment-17264859 ] Trevorzhang commented on HUDI-1528: ---     {code:java} //代码占位符[lingqu@xx-dev-cq-ecs-dtpbu-

[jira] [Comment Edited] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264859#comment-17264859 ] Trevorzhang edited comment on HUDI-1528 at 1/14/21, 12:25 PM: --

[jira] [Issue Comment Deleted] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Comment: was deleted (was:     {code:java} //[lingqu@xx-dev-cq-ecs-dtpbu-datalake-cdh-work-01 jars]$ sh

[jira] [Comment Edited] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264862#comment-17264862 ] Trevorzhang edited comment on HUDI-1528 at 1/14/21, 12:28 PM: --

[jira] [Commented] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264862#comment-17264862 ] Trevorzhang commented on HUDI-1528: --- {panel:title=我的标题} [lingqu@xx-dev-cq-ecs-dtpbu-data

[jira] [Issue Comment Deleted] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Comment: was deleted (was: {panel:title=log} [lingqu@xx-dev-cq-ecs-dtpbu-datalake-cdh-work-01 jars]$ sh

[jira] [Commented] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264868#comment-17264868 ] Trevorzhang commented on HUDI-1528: --- 21/01/14 20:21:00 INFO hive.HoodieHiveClient: Creat

[GitHub] [hudi] peng-xin opened a new issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists

2021-01-14 Thread GitBox
peng-xin opened a new issue #2448: URL: https://github.com/apache/hudi/issues/2448 **Environment Description** * Hudi version : 0.6.0 * Spark version : spark-2.4.4-bin-hadoop2.7 * Hive version : hive-2.3.4 * Hadoop version : hadoop2.7.3 * Storage (HDFS/S3/GCS..) :

[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2430?src=pr&el=h1) Report > Merging [#2430](https://codecov.io/gh/apache/hudi/pull/2430?src=pr&el=desc) (c4a04f9) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2260?src=pr&el=h1) Report > Merging [#2260](https://codecov.io/gh/apache/hudi/pull/2260?src=pr&el=desc) (7d0453e) in

[GitHub] [hudi] Trevor-zhang opened a new pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
Trevor-zhang opened a new pull request #2449: URL: https://github.com/apache/hudi/pull/2449 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of th

[jira] [Updated] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1528: - Labels: pull-request-available (was: ) > hudi-sync-tool error > > >

[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Summary: hudi-sync-tools error (was: hudi-sync-tool error) > hudi-sync-tools error > --

[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Description: When using hudi-sync-tools to synchronize to a remote hive, hivemetastore throw exceptions

[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] yanghua commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557462779 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +73,18 @@ private String latestInstant = "";

[GitHub] [hudi] yanghua commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
yanghua commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760264278 @wangxianghu please help to review thanks. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] yanghua commented on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-14 Thread GitBox
yanghua commented on pull request #2443: URL: https://github.com/apache/hudi/pull/2443#issuecomment-760270387 @liujinhui1994 Travis is red. @wangxianghu help to review firstly. This is an automated message from the Apache Git

[GitHub] [hudi] codecov-io commented on pull request #2444: [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database

2021-01-14 Thread GitBox
codecov-io commented on pull request #2444: URL: https://github.com/apache/hudi/pull/2444#issuecomment-760286269 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2444?src=pr&el=h1) Report > Merging [#2444](https://codecov.io/gh/apache/hudi/pull/2444?src=pr&el=desc) (0b4eb5c) into [ma

[GitHub] [hudi] vburenin commented on a change in pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vburenin commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r557508471 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean isBlockCorrupt(i

[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760299104 > @vburenin Left a comment to restructure the code to support buffering, are you going to look into improving the O(m*n) search ? At this point of time I think it is not necessa

[GitHub] [hudi] rakeshramakrishnan commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
rakeshramakrishnan commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760299334 Much thanks! This would address #2439 ? This is an automated message from the Apache Git Service. To res

[GitHub] [hudi] rakeshramakrishnan edited a comment on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
rakeshramakrishnan edited a comment on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760299334 Awesome! This would address #2439 ? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] codecov-io edited a comment on pull request #2444: [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2444: URL: https://github.com/apache/hudi/pull/2444#issuecomment-760286269 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2444?src=pr&el=h1) Report > Merging [#2444](https://codecov.io/gh/apache/hudi/pull/2444?src=pr&el=desc) (0b4eb5c) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2444: [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2444: URL: https://github.com/apache/hudi/pull/2444#issuecomment-760286269 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] vburenin opened a new pull request #2450: Try to init class trying different signatures instead of checking its name.

2021-01-14 Thread GitBox
vburenin opened a new pull request #2450: URL: https://github.com/apache/hudi/pull/2450 ## What is the purpose of the pull request UtilHelpers.createSource had a hardcoded way of checking which constructor signature needs to be used to instantiate a class which makes it impossible t

[GitHub] [hudi] n3nash merged pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-14 Thread GitBox
n3nash merged pull request #2424: URL: https://github.com/apache/hudi/pull/2424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[hudi] branch master updated: [HUDI-1509]: Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting (#2424)

2021-01-14 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 749f657 [HUDI-1509]: Reverting LinkedHashSet ch

[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2431?src=pr&el=h1) Report > Merging [#2431](https://codecov.io/gh/apache/hudi/pull/2431?src=pr&el=desc) (e63414d) in

[jira] [Updated] (HUDI-1526) Translate the spark api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1526: - Labels: pull-request-available (was: ) > Translate the spark api partitionBy to > hoodie.datasou

[GitHub] [hudi] codecov-io edited a comment on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-758857465 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2434?src=pr&el=h1) Report > Merging [#2434](https://codecov.io/gh/apache/hudi/pull/2434?src=pr&el=desc) (53d9942) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2431?src=pr&el=h1) Report > Merging [#2431](https://codecov.io/gh/apache/hudi/pull/2431?src=pr&el=desc) (e63414d) in

[jira] [Assigned] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1529: --- Assignee: Udit Mehrotra > Spark-SQL drvier runs out of memory when metadata table is enabled

[jira] [Created] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1529: --- Summary: Spark-SQL drvier runs out of memory when metadata table is enabled Key: HUDI-1529 URL: https://issues.apache.org/jira/browse/HUDI-1529 Project: Apache Hudi

[jira] [Updated] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1529: Description: When testing a large dataset around 1.2TB data and around 20k files, we notice an issu

[GitHub] [hudi] vinothchandar commented on pull request #2442: Adding new configurations in 0.7.0

2021-01-14 Thread GitBox
vinothchandar commented on pull request #2442: URL: https://github.com/apache/hudi/pull/2442#issuecomment-760571146 @nsivabalan can we just fix the configs first on the current version of the site. its possible we will make more changes until we release? We can make the 0.7.0 specific page

[GitHub] [hudi] vburenin edited a comment on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vburenin edited a comment on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760299104 > @vburenin Left a comment to restructure the code to support buffering, are you going to look into improving the O(m*n) search ? At this point of time I think it is

[GitHub] [hudi] vinothchandar commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vinothchandar commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760571612 @n3nash can we take a call on this and get it into the current release. marking as blocker for now. This is

[GitHub] [hudi] vinothchandar commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vinothchandar commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760571874 @vburenin do you mind creating a JIRA for this issue.? We can give you perms if you can ping us your id from issue.apache.org/jira

[GitHub] [hudi] umehrot2 opened a new pull request #2451: [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits

2021-01-14 Thread GitBox
umehrot2 opened a new pull request #2451: URL: https://github.com/apache/hudi/pull/2451 ## What is the purpose of the pull request This PR fixes an issue we identified when enabling **metadata table** for SparkSQL queries, which cause a huge number of file splits to be generate, cau

[jira] [Updated] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1529: - Labels: pull-request-available (was: ) > Spark-SQL drvier runs out of memory when metadata table

[GitHub] [hudi] wangxianghu commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r557806942 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -49,6 +49,9 @@ @Parameter(names = {"--jdbc-url"},

[GitHub] [hudi] wangxianghu commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r557807516 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -49,6 +49,9 @@ @Parameter(names = {"--jdbc-url"},

[GitHub] [hudi] Trevor-zhang commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
Trevor-zhang commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760586002 > Awesome! This would address #2439 ? I'm not sure if it can solve your problem, wait for me to test it. ---

[GitHub] [hudi] Trevor-zhang commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
Trevor-zhang commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r557807620 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -49,6 +49,9 @@ @Parameter(names = {"--jdbc-url"}

[jira] [Created] (HUDI-1530) make HoodieDeltaStreamer and SparkDataSource support HiveMetaStore

2021-01-14 Thread Trevorzhang (Jira)
Trevorzhang created HUDI-1530: - Summary: make HoodieDeltaStreamer and SparkDataSource support HiveMetaStore Key: HUDI-1530 URL: https://issues.apache.org/jira/browse/HUDI-1530 Project: Apache Hudi

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557818230 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -222,4 +234,59 @@ public void close() throws Exception

[GitHub] [hudi] bvaradar commented on issue #2439: [SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol

2021-01-14 Thread GitBox
bvaradar commented on issue #2439: URL: https://github.com/apache/hudi/issues/2439#issuecomment-760616648 @satishkotha : Can you help with this ? This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [hudi] bvaradar commented on issue #2446: [SUPPORT] The parameter "hoodie.bloom.index.filter.type" does not take effect in deltaStreamer

2021-01-14 Thread GitBox
bvaradar commented on issue #2446: URL: https://github.com/apache/hudi/issues/2446#issuecomment-760617405 @quitozang : Which version of Hoodie are you using ? Are you passing the configuration like "--hoodie-conf hoodie.bloom.index.filter.type=ABC" ? @nsivabalan : Can you follow-up

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557820490 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -102,65 +105,76 @@ public void open() throws Exception

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557821706 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +73,18 @@ private String latestInstant = "";

[GitHub] [hudi] vinothchandar commented on a change in pull request #2451: [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits

2021-01-14 Thread GitBox
vinothchandar commented on a change in pull request #2451: URL: https://github.com/apache/hudi/pull/2451#discussion_r557874678 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java ## @@ -202,7 +202,7 @@ protected BaseTableMetadata(HoodieEngin

[GitHub] [hudi] so-lazy commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

2021-01-14 Thread GitBox
so-lazy commented on issue #2338: URL: https://github.com/apache/hudi/issues/2338#issuecomment-760708042 @bvaradar sir, now i used global simple index, but for some satages **Getting small files from partitions** **Compacting file slices** they cost so long mintues, and i attach m