[GitHub] [hudi] so-lazy commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

2021-01-14 Thread GitBox
so-lazy commented on issue #2338: URL: https://github.com/apache/hudi/issues/2338#issuecomment-760708042 @bvaradar sir, now i used global simple index, but for some satages **Getting small files from partitions** **Compacting file slices** they cost so long mintues, and i attach

[GitHub] [hudi] vinothchandar commented on a change in pull request #2451: [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits

2021-01-14 Thread GitBox
vinothchandar commented on a change in pull request #2451: URL: https://github.com/apache/hudi/pull/2451#discussion_r557874678 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java ## @@ -202,7 +202,7 @@ protected

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557821706 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +73,18 @@ private String latestInstant = "";

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557820490 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -102,65 +105,76 @@ public void open() throws Exception

[GitHub] [hudi] bvaradar commented on issue #2446: [SUPPORT] The parameter "hoodie.bloom.index.filter.type" does not take effect in deltaStreamer

2021-01-14 Thread GitBox
bvaradar commented on issue #2446: URL: https://github.com/apache/hudi/issues/2446#issuecomment-760617405 @quitozang : Which version of Hoodie are you using ? Are you passing the configuration like "--hoodie-conf hoodie.bloom.index.filter.type=ABC" ? @nsivabalan : Can you follow-up

[GitHub] [hudi] bvaradar commented on issue #2439: [SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol

2021-01-14 Thread GitBox
bvaradar commented on issue #2439: URL: https://github.com/apache/hudi/issues/2439#issuecomment-760616648 @satishkotha : Can you help with this ? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] loukey-lj commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
loukey-lj commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557818230 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -222,4 +234,59 @@ public void close() throws Exception

[jira] [Created] (HUDI-1530) make HoodieDeltaStreamer and SparkDataSource support HiveMetaStore

2021-01-14 Thread Trevorzhang (Jira)
Trevorzhang created HUDI-1530: - Summary: make HoodieDeltaStreamer and SparkDataSource support HiveMetaStore Key: HUDI-1530 URL: https://issues.apache.org/jira/browse/HUDI-1530 Project: Apache Hudi

[GitHub] [hudi] Trevor-zhang commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
Trevor-zhang commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r557807620 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -49,6 +49,9 @@ @Parameter(names =

[GitHub] [hudi] Trevor-zhang commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
Trevor-zhang commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760586002 > Awesome! This would address #2439 ? I'm not sure if it can solve your problem, wait for me to test it.

[GitHub] [hudi] wangxianghu commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r557807516 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -49,6 +49,9 @@ @Parameter(names =

[GitHub] [hudi] wangxianghu commented on a change in pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2449: URL: https://github.com/apache/hudi/pull/2449#discussion_r557806942 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java ## @@ -49,6 +49,9 @@ @Parameter(names =

[jira] [Updated] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1529: - Labels: pull-request-available (was: ) > Spark-SQL drvier runs out of memory when metadata table

[GitHub] [hudi] umehrot2 opened a new pull request #2451: [HUDI-1529] Add block size to the FileStatus objects returned from metadata table to avoid too many file splits

2021-01-14 Thread GitBox
umehrot2 opened a new pull request #2451: URL: https://github.com/apache/hudi/pull/2451 ## What is the purpose of the pull request This PR fixes an issue we identified when enabling **metadata table** for SparkSQL queries, which cause a huge number of file splits to be generate,

[GitHub] [hudi] vinothchandar commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vinothchandar commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760571874 @vburenin do you mind creating a JIRA for this issue.? We can give you perms if you can ping us your id from issue.apache.org/jira

[GitHub] [hudi] vinothchandar commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vinothchandar commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760571612 @n3nash can we take a call on this and get it into the current release. marking as blocker for now. This

[GitHub] [hudi] vburenin edited a comment on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vburenin edited a comment on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760299104 > @vburenin Left a comment to restructure the code to support buffering, are you going to look into improving the O(m*n) search ? At this point of time I think it is

[GitHub] [hudi] vinothchandar commented on pull request #2442: Adding new configurations in 0.7.0

2021-01-14 Thread GitBox
vinothchandar commented on pull request #2442: URL: https://github.com/apache/hudi/pull/2442#issuecomment-760571146 @nsivabalan can we just fix the configs first on the current version of the site. its possible we will make more changes until we release? We can make the 0.7.0 specific

[jira] [Updated] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra updated HUDI-1529: Description: When testing a large dataset around 1.2TB data and around 20k files, we notice an

[jira] [Assigned] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1529: --- Assignee: Udit Mehrotra > Spark-SQL drvier runs out of memory when metadata table is enabled

[jira] [Created] (HUDI-1529) Spark-SQL drvier runs out of memory when metadata table is enabled

2021-01-14 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1529: --- Summary: Spark-SQL drvier runs out of memory when metadata table is enabled Key: HUDI-1529 URL: https://issues.apache.org/jira/browse/HUDI-1529 Project: Apache Hudi

[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=h1) Report > Merging [#2431](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=desc) (e63414d) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2434: URL: https://github.com/apache/hudi/pull/2434#issuecomment-758857465 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2434?src=pr=h1) Report > Merging [#2434](https://codecov.io/gh/apache/hudi/pull/2434?src=pr=desc) (53d9942) into

[jira] [Updated] (HUDI-1526) Translate the spark api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1526: - Labels: pull-request-available (was: ) > Translate the spark api partitionBy to >

[GitHub] [hudi] codecov-io edited a comment on pull request #2431: [HUDI-1526]translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-757929313 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=h1) Report > Merging [#2431](https://codecov.io/gh/apache/hudi/pull/2431?src=pr=desc) (e63414d) into

[hudi] branch master updated: [HUDI-1509]: Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting (#2424)

2021-01-14 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 749f657 [HUDI-1509]: Reverting LinkedHashSet

[GitHub] [hudi] n3nash merged pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-14 Thread GitBox
n3nash merged pull request #2424: URL: https://github.com/apache/hudi/pull/2424 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] vburenin opened a new pull request #2450: Try to init class trying different signatures instead of checking its name.

2021-01-14 Thread GitBox
vburenin opened a new pull request #2450: URL: https://github.com/apache/hudi/pull/2450 ## What is the purpose of the pull request UtilHelpers.createSource had a hardcoded way of checking which constructor signature needs to be used to instantiate a class which makes it impossible

[GitHub] [hudi] codecov-io edited a comment on pull request #2444: [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2444: URL: https://github.com/apache/hudi/pull/2444#issuecomment-760286269 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2444: [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2444: URL: https://github.com/apache/hudi/pull/2444#issuecomment-760286269 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2444?src=pr=h1) Report > Merging [#2444](https://codecov.io/gh/apache/hudi/pull/2444?src=pr=desc) (0b4eb5c) into

[GitHub] [hudi] rakeshramakrishnan edited a comment on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
rakeshramakrishnan edited a comment on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760299334 Awesome! This would address #2439 ? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] rakeshramakrishnan commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
rakeshramakrishnan commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760299334 Much thanks! This would address #2439 ? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] vburenin commented on pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vburenin commented on pull request #2440: URL: https://github.com/apache/hudi/pull/2440#issuecomment-760299104 > @vburenin Left a comment to restructure the code to support buffering, are you going to look into improving the O(m*n) search ? At this point of time I think it is not

[GitHub] [hudi] vburenin commented on a change in pull request #2440: Fixed suboptimal implementation of a magic sequence search

2021-01-14 Thread GitBox
vburenin commented on a change in pull request #2440: URL: https://github.com/apache/hudi/pull/2440#discussion_r557508471 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFileReader.java ## @@ -274,19 +275,27 @@ private boolean

[GitHub] [hudi] codecov-io commented on pull request #2444: [MINOR] Fix dataSource cannot use hoodie.datasource.hive_sync.auto_create_database

2021-01-14 Thread GitBox
codecov-io commented on pull request #2444: URL: https://github.com/apache/hudi/pull/2444#issuecomment-760286269 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2444?src=pr=h1) Report > Merging [#2444](https://codecov.io/gh/apache/hudi/pull/2444?src=pr=desc) (0b4eb5c) into

[GitHub] [hudi] yanghua commented on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-14 Thread GitBox
yanghua commented on pull request #2443: URL: https://github.com/apache/hudi/pull/2443#issuecomment-760270387 @liujinhui1994 Travis is red. @wangxianghu help to review firstly. This is an automated message from the Apache

[GitHub] [hudi] yanghua commented on pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
yanghua commented on pull request #2449: URL: https://github.com/apache/hudi/pull/2449#issuecomment-760264278 @wangxianghu please help to review thanks. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] yanghua commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557462779 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +73,18 @@ private String latestInstant = "";

[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Summary: hudi-sync-tools error (was: hudi-sync-tool error) > hudi-sync-tools error >

[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Description: When using hudi-sync-tools to synchronize to a remote hive, hivemetastore throw

[jira] [Updated] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1528: - Labels: pull-request-available (was: ) > hudi-sync-tool error > > >

[GitHub] [hudi] Trevor-zhang opened a new pull request #2449: [HUDI-1528] hudi-sync-tools supports synchronization to remote hive

2021-01-14 Thread GitBox
Trevor-zhang opened a new pull request #2449: URL: https://github.com/apache/hudi/pull/2449 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2260?src=pr=h1) Report > Merging [#2260](https://codecov.io/gh/apache/hudi/pull/2260?src=pr=desc) (7d0453e) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2430: URL: https://github.com/apache/hudi/pull/2430#issuecomment-757736411 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2430?src=pr=h1) Report > Merging [#2430](https://codecov.io/gh/apache/hudi/pull/2430?src=pr=desc) (c4a04f9) into

[GitHub] [hudi] peng-xin opened a new issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists

2021-01-14 Thread GitBox
peng-xin opened a new issue #2448: URL: https://github.com/apache/hudi/issues/2448 **Environment Description** * Hudi version : 0.6.0 * Spark version : spark-2.4.4-bin-hadoop2.7 * Hive version : hive-2.3.4 * Hadoop version : hadoop2.7.3 * Storage (HDFS/S3/GCS..) :

[jira] [Commented] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264868#comment-17264868 ] Trevorzhang commented on HUDI-1528: --- 21/01/14 20:21:00 INFO hive.HoodieHiveClient: Creating table with

[jira] [Issue Comment Deleted] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Comment: was deleted (was: {panel:title=log} [lingqu@xx-dev-cq-ecs-dtpbu-datalake-cdh-work-01 jars]$ sh

[jira] [Comment Edited] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264862#comment-17264862 ] Trevorzhang edited comment on HUDI-1528 at 1/14/21, 12:28 PM: --

[jira] [Commented] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264862#comment-17264862 ] Trevorzhang commented on HUDI-1528: --- {panel:title=我的标题} [lingqu@xx-dev-cq-ecs-dtpbu-datalake-cdh-work-01

[jira] [Issue Comment Deleted] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1528: -- Comment: was deleted (was:     {code:java} //[lingqu@xx-dev-cq-ecs-dtpbu-datalake-cdh-work-01 jars]$

[jira] [Commented] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264859#comment-17264859 ] Trevorzhang commented on HUDI-1528: ---     {code:java}

[jira] [Comment Edited] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264859#comment-17264859 ] Trevorzhang edited comment on HUDI-1528 at 1/14/21, 12:25 PM: --    

[jira] [Comment Edited] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17264859#comment-17264859 ] Trevorzhang edited comment on HUDI-1528 at 1/14/21, 12:25 PM: --    

[jira] [Created] (HUDI-1528) hudi-sync-tool error

2021-01-14 Thread Trevorzhang (Jira)
Trevorzhang created HUDI-1528: - Summary: hudi-sync-tool error Key: HUDI-1528 URL: https://issues.apache.org/jira/browse/HUDI-1528 Project: Apache Hudi Issue Type: Bug Components: Hive

[GitHub] [hudi] yanghua commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557347825 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant = "";

[GitHub] [hudi] codecov-io commented on pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-14 Thread GitBox
codecov-io commented on pull request #2443: URL: https://github.com/apache/hudi/pull/2443#issuecomment-760147630 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2443?src=pr=h1) Report > Merging [#2443](https://codecov.io/gh/apache/hudi/pull/2443?src=pr=desc) (0d98db7) into

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557337056 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/HoodieOptions.java ## @@ -0,0 +1,248 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557331716 ## File path: hudi-flink/pom.xml ## @@ -124,28 +124,77 @@ kafka-clients ${kafka.version} + + org.apache.flink +

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557331716 ## File path: hudi-flink/pom.xml ## @@ -124,28 +124,77 @@ kafka-clients ${kafka.version} + + org.apache.flink +

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557329799 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/OverwriteWithLatestAvroPayload.java ## @@ -79,6 +79,11 @@ public

[GitHub] [hudi] codecov-io edited a comment on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-745334158 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[jira] [Updated] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-01-14 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1527: - Labels: pull-request-available (was: ) > Automatically infer the data directory, users only need

[GitHub] [hudi] teeyog opened a new pull request #2447: [HUDI-1527] automatically infer the data directory, users only need t…

2021-01-14 Thread GitBox
teeyog opened a new pull request #2447: URL: https://github.com/apache/hudi/pull/2447 …o specify the table directory ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*

[jira] [Updated] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-01-14 Thread teeyog (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] teeyog updated HUDI-1527: - Description: To read the hudi table, you need to specify the path, but the path is not only the tablePath

[jira] [Created] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-01-14 Thread teeyog (Jira)
teeyog created HUDI-1527: Summary: Automatically infer the data directory, users only need to specify the table directory Key: HUDI-1527 URL: https://issues.apache.org/jira/browse/HUDI-1527 Project: Apache

[GitHub] [hudi] yanghua commented on a change in pull request #2430: [HUDI-1522] Add a new pipeline for Flink writer

2021-01-14 Thread GitBox
yanghua commented on a change in pull request #2430: URL: https://github.com/apache/hudi/pull/2430#discussion_r557306549 ## File path: hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java ## @@ -81,16 +103,50 @@ public static DFSPropertiesConfiguration

[GitHub] [hudi] wangxianghu commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557021821 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant =

[GitHub] [hudi] codecov-io edited a comment on pull request #2334: [HUDI-1453] Throw Exception when input data schema is not equal to th…

2021-01-14 Thread GitBox
codecov-io edited a comment on pull request #2334: URL: https://github.com/apache/hudi/pull/2334#issuecomment-745334158 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2334?src=pr=h1) Report > Merging [#2334](https://codecov.io/gh/apache/hudi/pull/2334?src=pr=desc) (bbb604a) into

[GitHub] [hudi] wangxianghu commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557021821 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant =

[GitHub] [hudi] wangxianghu commented on a change in pull request #2434: [HUDI-1511] InstantGenerateOperator support multiple parallelism

2021-01-14 Thread GitBox
wangxianghu commented on a change in pull request #2434: URL: https://github.com/apache/hudi/pull/2434#discussion_r557021821 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/InstantGenerateOperator.java ## @@ -71,16 +72,18 @@ private String latestInstant =

[GitHub] [hudi] quitozang opened a new issue #2446: [SUPPORT] The parameter "hoodie.bloom.index.filter.type" does not take effect in deltaStreamer

2021-01-14 Thread GitBox
quitozang opened a new issue #2446: URL: https://github.com/apache/hudi/issues/2446 Why does this parameter "hoodie.bloom.index.filter.type" not take effect in deltaStreamer, the bloom filter type is always in SIMPLE. This

[GitHub] [hudi] xushiyan commented on pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-14 Thread GitBox
xushiyan commented on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-760076561 @vinothchandar The style can be sync'ed by - using google-java-format in spotless config and `spotless:apply` enforces the style that is also compatible with existing

[GitHub] [hudi] xushiyan removed a comment on pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-14 Thread GitBox
xushiyan removed a comment on pull request #2426: URL: https://github.com/apache/hudi/pull/2426#issuecomment-757438915 ## TODO - [ ] Manually verify some diffs after spotless apply won't conflict with IDE formatter and checkstyle

[GitHub] [hudi] xushiyan commented on a change in pull request #2426: [HUDI-304] Configure spotless and java style

2021-01-14 Thread GitBox
xushiyan commented on a change in pull request #2426: URL: https://github.com/apache/hudi/pull/2426#discussion_r554531429 ## File path: pom.xml ## @@ -198,34 +200,36 @@