[GitHub] [incubator-hudi] yanghua commented on issue #1541: [MINOR] Add ability to specify time unit for TimestampBasedKeyGenerator

2020-04-22 Thread GitBox
yanghua commented on issue #1541: URL: https://github.com/apache/incubator-hudi/pull/1541#issuecomment-618193932 @afilipchik Firstly, You may need to figure out why the Travis is red. This is an automated message from the

[jira] [Assigned] (HUDI-787) Implement HoodieGlobalBloomIndexV2

2020-04-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lamber-ken reassigned HUDI-787: --- Assignee: lamber-ken > Implement HoodieGlobalBloomIndexV2 > -- > >

[jira] [Updated] (HUDI-821) Fix the wrong annotation of JCommander IStringConverter

2020-04-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lamber-ken updated HUDI-821: Status: Open (was: New) > Fix the wrong annotation of JCommander IStringConverter >

[jira] [Closed] (HUDI-821) Fix the wrong annotation of JCommander IStringConverter

2020-04-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lamber-ken closed HUDI-821. --- Resolution: Fixed > Fix the wrong annotation of JCommander IStringConverter >

[jira] [Resolved] (HUDI-827) Translation error

2020-04-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lamber-ken resolved HUDI-827. - Resolution: Fixed > Translation error > - > > Key: HUDI-827 >

[GitHub] [incubator-hudi] vinothchandar commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
vinothchandar commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618180277 @harshi2506 I am suspecting this may be due to a recent bug we fixed on master (still not 100%). Are you open to building hudi off master branch and giving that a shot?

[GitHub] [incubator-hudi] vinothchandar commented on issue #1549: Potential issue when using Deltastreamer with DMS

2020-04-22 Thread GitBox
vinothchandar commented on issue #1549: URL: https://github.com/apache/incubator-hudi/issues/1549#issuecomment-618179797 > When I look into the specific S3 folder, I see that the insert and delete into the partition actually create a new .parquet file with no log file. So

[GitHub] [incubator-hudi] vinothchandar commented on issue #1540: [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.

2020-04-22 Thread GitBox
vinothchandar commented on issue #1540: URL: https://github.com/apache/incubator-hudi/pull/1540#issuecomment-618175675 @satishkotha let's then break that up into a separate JIRA (tagged with Code Cleanup component). We can limit scope to these insert related handles and move on.. wdyt

[GitHub] [incubator-hudi] vingov commented on issue #1526: [HUDI-1526] Add pyspark example in quickstart

2020-04-22 Thread GitBox
vingov commented on issue #1526: URL: https://github.com/apache/incubator-hudi/pull/1526#issuecomment-618173541 @EdwinGuo - Thanks for trying multiple options, I thought it would be easy to create those language-specific tabs, if it too much work, we can create a separate page for pyspark

[GitHub] [incubator-hudi] allenzhg opened a new issue #1555: [SUPPORT]

2020-04-22 Thread GitBox
allenzhg opened a new issue #1555: URL: https://github.com/apache/incubator-hudi/issues/1555 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #256

2020-04-22 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.35 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[jira] [Created] (HUDI-834) Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue

2020-04-22 Thread Zili Chen (Jira)
Zili Chen created HUDI-834: -- Summary: Concrete signature of HoodieRecordPayload#combineAndGetUpdateValue & HoodieRecordPayload#getInsertValue Key: HUDI-834 URL: https://issues.apache.org/jira/browse/HUDI-834

[jira] [Commented] (HUDI-821) Fix the wrong annotation of JCommander IStringConverter

2020-04-22 Thread dengziming (Jira)
[ https://issues.apache.org/jira/browse/HUDI-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090198#comment-17090198 ] dengziming commented on HUDI-821: - please close it since it has been resolved by another commit > Fix the

[GitHub] [incubator-hudi] wangxianghu commented on issue #1224: [HUDI-397] Normalize log print statement

2020-04-22 Thread GitBox
wangxianghu commented on issue #1224: URL: https://github.com/apache/incubator-hudi/pull/1224#issuecomment-618142920 hi @n3nash, I have rebased this pr PTAL, thanks This is an automated message from the Apache Git Service.

[GitHub] [incubator-hudi] hddong opened a new pull request #1554: [HUDI-704]Add test for RepairsCommand

2020-04-22 Thread GitBox
hddong opened a new pull request #1554: URL: https://github.com/apache/incubator-hudi/pull/1554 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose

[jira] [Updated] (HUDI-704) Add unit test for RepairsCommand

2020-04-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-704: Labels: pull-request-available (was: ) > Add unit test for RepairsCommand >

[GitHub] [incubator-hudi] wanglisheng81 commented on issue #1551: [HUDI-827] fix translation error

2020-04-22 Thread GitBox
wanglisheng81 commented on issue #1551: URL: https://github.com/apache/incubator-hudi/pull/1551#issuecomment-618132796 > Thanks for your contributing @wanglisheng81 ! LGTM, merging. > PS: it would be a MINOR change without a jira issue. :) anyway, thanks for the contribution. Got

[GitHub] [incubator-hudi] c-f-cooper commented on issue #143: Tracking ticket for folks to be added to slack group

2020-04-22 Thread GitBox
c-f-cooper commented on issue #143: URL: https://github.com/apache/incubator-hudi/issues/143#issuecomment-618121217 > @c-f-cooper @superguhua @jenu9417 @dahirainbow welcome and done thanks This is an automated message

[GitHub] [incubator-hudi] lamber-ken commented on issue #1044: [HUDI-361] Implement CSV metrics reporter

2020-04-22 Thread GitBox
lamber-ken commented on issue #1044: URL: https://github.com/apache/incubator-hudi/pull/1044#issuecomment-618120915 hi @XuQianJin-Stars, thanks for your contribution, CSV metrices reporter seems not popular in product, so I suggest closing this, WDYT? @leesf also, flink doesn't use

[GitHub] [incubator-hudi] wangxianghu commented on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-22 Thread GitBox
wangxianghu commented on issue #1100: URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618119493 @n3nash well done ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [incubator-hudi] n3nash edited a comment on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-22 Thread GitBox
n3nash edited a comment on issue #1100: URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618113749 @yanghua @bvaradar @vinothchandar I've fixed this PR since it was failing builds due to multiple pom issues, I've rebased the code from the last time we did this (lots of

[jira] [Updated] (HUDI-289) Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-289: Labels: pull-request-available (was: ) > Implement a test suite to support long running test for

[GitHub] [incubator-hudi] n3nash commented on issue #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-04-22 Thread GitBox
n3nash commented on issue #1100: URL: https://github.com/apache/incubator-hudi/pull/1100#issuecomment-618113749 @yanghua @bvaradar @vinothchandar I've fixed this PR, cleaned up a bunch of the code. This test suite now has test cases that one can use to run end to end tests in junit.

[jira] [Created] (HUDI-833) Allow test suite to change hudi write configs for any dag node

2020-04-22 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-833: Summary: Allow test suite to change hudi write configs for any dag node Key: HUDI-833 URL: https://issues.apache.org/jira/browse/HUDI-833 Project: Apache Hudi

[jira] [Created] (HUDI-832) Add ability to induce failures to catch issues

2020-04-22 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-832: Summary: Add ability to induce failures to catch issues Key: HUDI-832 URL: https://issues.apache.org/jira/browse/HUDI-832 Project: Apache Hudi (incubating)

[jira] [Created] (HUDI-831) Augment the existing DAG with more complex use-cases seen at Uber

2020-04-22 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-831: Summary: Augment the existing DAG with more complex use-cases seen at Uber Key: HUDI-831 URL: https://issues.apache.org/jira/browse/HUDI-831 Project: Apache Hudi

[jira] [Assigned] (HUDI-830) Fix issues related to running the test suite in docker due to Hive 2.x

2020-04-22 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-830: Assignee: Abhishek Modi > Fix issues related to running the test suite in docker due to

[jira] [Created] (HUDI-830) Fix issues related to running the test suite in docker due to Hive 2.x

2020-04-22 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-830: Summary: Fix issues related to running the test suite in docker due to Hive 2.x Key: HUDI-830 URL: https://issues.apache.org/jira/browse/HUDI-830 Project: Apache

[jira] [Comment Edited] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090098#comment-17090098 ] Udit Mehrotra edited comment on HUDI-829 at 4/23/20, 12:07 AM: --- You may also

[jira] [Comment Edited] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090098#comment-17090098 ] Udit Mehrotra edited comment on HUDI-829 at 4/22/20, 11:59 PM: --- You may also

[jira] [Commented] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090098#comment-17090098 ] Udit Mehrotra commented on HUDI-829: You may also want to look at my implementation of custom relation

[jira] [Commented] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090095#comment-17090095 ] Udit Mehrotra commented on HUDI-829: [~nishith29] Thanks for creating the ticket. So the issue I was

[incubator-hudi] branch hudi_test_suite_refactor updated (388842c -> da3232e)

2020-04-22 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. discard 388842c Testing running 3 builds to limit total build time add da3232e Testing

[GitHub] [incubator-hudi] nsivabalan commented on issue #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-22 Thread GitBox
nsivabalan commented on issue #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-618094863 @vinothchandar : I am done for the most part. Just that unit tests are failing in travis, but succeeds locally. I need to fix that. but the patch in general is good to be

[GitHub] [incubator-hudi] harshi2506 edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 edited a comment on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618068101 @vinothchandar yes, it is happening every time, I tried it 3 times and its always the same. I see a hoodie commit being added in atmost 5-6 minutes and then it

[GitHub] [incubator-hudi] satishkotha commented on issue #1540: [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.

2020-04-22 Thread GitBox
satishkotha commented on issue #1540: URL: https://github.com/apache/incubator-hudi/pull/1540#issuecomment-618092361 > +1 on the change itself.. For completeness let's also bring MergeHandle under the same factory implementation? I tried to fit in MergeHandle. But it doesn't seem to

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r413398227 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HoodieTimer.java ## @@ -69,4 +76,13 @@ public long endTimer() { }

[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-829: - Summary: Efficiently reading hudi tables through spark-shell (was: Reading Hudi tables through

[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2020-04-22 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-829: - Description: [~uditme] Created this ticket to track some discussion on read/query path of spark

[jira] [Created] (HUDI-829) Reading Hudi tables through spark-shell slow (even with spark.sql.hive.convertMetastoreParquet)

2020-04-22 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-829: Summary: Reading Hudi tables through spark-shell slow (even with spark.sql.hive.convertMetastoreParquet) Key: HUDI-829 URL: https://issues.apache.org/jira/browse/HUDI-829

[jira] [Updated] (HUDI-829) Reading Hudi tables through spark-shell is slow (even with spark.sql.hive.convertMetastoreParquet)

2020-04-22 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-829: - Summary: Reading Hudi tables through spark-shell is slow (even with

[GitHub] [incubator-hudi] codecov-io removed a comment on issue #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
codecov-io removed a comment on issue #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-618028443 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1469?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r413395407 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexV2.java ## @@ -0,0 +1,321 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] lamber-ken commented on issue #1526: [HUDI-1526] Add pyspark example in quickstart

2020-04-22 Thread GitBox
lamber-ken commented on issue #1526: URL: https://github.com/apache/incubator-hudi/pull/1526#issuecomment-618086091 > @vingov I had copied the https://github.com/apache/spark/blob/15462e1a8fa8da54ac51f4d21f567f3c288e6701/docs/js/main.js and reference the js lib like:

[GitHub] [incubator-hudi] lamber-ken commented on issue #1546: Issue - Table Read fails in Spark Submit , Where as succeeds in spark-shell

2020-04-22 Thread GitBox
lamber-ken commented on issue #1546: URL: https://github.com/apache/incubator-hudi/issues/1546#issuecomment-618082468 > over to you @lamber-ken :) Loving to accept it. This is an automated message from the Apache Git

[GitHub] [incubator-hudi] harshi2506 edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 edited a comment on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618070975 hi @lamber-ken, I am already setting parallelism to 200 `hudiOptions = (HoodieWriteConfig.TABLE_NAME -> "table_name",

[GitHub] [incubator-hudi] harshi2506 edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 edited a comment on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618070975 hi @lamber-ken, I am already setting parallelism to 200 hudiOptions = (HoodieWriteConfig.TABLE_NAME -> "table_name",

[GitHub] [incubator-hudi] harshi2506 edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 edited a comment on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618070975 hi @lamber-ken, I am already setting parallelism to 200 hudiOptions += (HoodieWriteConfig.TABLE_NAME -> "table_name",

[GitHub] [incubator-hudi] harshi2506 commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618070975 hi @lamber-ken, I am already setting parallelism to 200 hudiOptions += (HoodieWriteConfig.TABLE_NAME -> "table_name",

[GitHub] [incubator-hudi] harshi2506 commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618068101 @vinothchandar yes, it is happening every time, I tried it 3 times and its always the same. I see a hoodie commit being added in atmost 5-6 minutes and then it takes

[GitHub] [incubator-hudi] vinothchandar commented on issue #1546: Issue - Table Read fails in Spark Submit , Where as succeeds in spark-shell

2020-04-22 Thread GitBox
vinothchandar commented on issue #1546: URL: https://github.com/apache/incubator-hudi/issues/1546#issuecomment-618035770 over to you @lamber-ken :) This is an automated message from the Apache Git Service. To respond to the

[GitHub] [incubator-hudi] vinothchandar commented on issue #1546: Issue - Table Read fails in Spark Submit , Where as succeeds in spark-shell

2020-04-22 Thread GitBox
vinothchandar commented on issue #1546: URL: https://github.com/apache/incubator-hudi/issues/1546#issuecomment-618035688 Hmmm fairly certain this is a packaging issue and some kind of class mismatch between hudi's hive and spark's hive? Our docker setup uses spark 2.3 with Hive 2.x

[GitHub] [incubator-hudi] vinothchandar commented on issue #1433: [HUDI-728]: Implement custom key generator

2020-04-22 Thread GitBox
vinothchandar commented on issue #1433: URL: https://github.com/apache/incubator-hudi/pull/1433#issuecomment-618031058 @pratyakshsharma sorry .. fell off my radar since I was not an assignee.. I do have some concerns.. Review coming by your day time :)

[jira] [Comment Edited] (HUDI-828) Open Questions before merging Bootstrap

2020-04-22 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090026#comment-17090026 ] Balaji Varadarajan edited comment on HUDI-828 at 4/22/20, 8:44 PM: --- HBase

[jira] [Commented] (HUDI-828) Open Questions before merging Bootstrap

2020-04-22 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090026#comment-17090026 ] Balaji Varadarajan commented on HUDI-828: - HBase Shading : We are shading hbase in all hudi

[GitHub] [incubator-hudi] vinothchandar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-04-22 Thread GitBox
vinothchandar commented on issue #1550: URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618028676 @badion This does seem directly related to the complex types issue fixed recently.. 0.5.1-2 we moved out of databricks-avro and to spark-avro and this seems like a

[GitHub] [incubator-hudi] codecov-io commented on issue #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
codecov-io commented on issue #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-618028443 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1469?src=pr=h1) Report > Merging [#1469](https://codecov.io/gh/apache/incubator-hudi/pull/1469?src=pr=desc)

[GitHub] [incubator-hudi] vinothchandar commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
vinothchandar commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618024238 @harshi2506 the pause is weird.. As you can see the wall clock time itself is small.. i.e if you subtract the time lost pausing.. Is this reproducible? i.e

[jira] [Created] (HUDI-828) Open Questions before merging Bootstrap

2020-04-22 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-828: --- Summary: Open Questions before merging Bootstrap Key: HUDI-828 URL: https://issues.apache.org/jira/browse/HUDI-828 Project: Apache Hudi (incubating)

[jira] [Updated] (HUDI-828) Open Questions before merging Bootstrap

2020-04-22 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-828: Status: Open (was: New) > Open Questions before merging Bootstrap >

[GitHub] [incubator-hudi] vinothchandar commented on issue #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
vinothchandar commented on issue #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#issuecomment-618017951 @lamber-ken we are hoping to get this into the next release as well . any etas on final review? :) This

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
vinothchandar commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r413300954 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HoodieTimer.java ## @@ -69,4 +76,13 @@ public long endTimer() { }

[GitHub] [incubator-hudi] vinothchandar commented on issue #1402: [WIP][HUDI-407] Adding Simple Index

2020-04-22 Thread GitBox
vinothchandar commented on issue #1402: URL: https://github.com/apache/incubator-hudi/pull/1402#issuecomment-618015872 @nsivabalan any ETAs on when this will be open for a full final review? This is an automated message

[GitHub] [incubator-hudi] vinothchandar commented on issue #1289: [HUDI-92] Provide reasonable names for Spark DAG stages in Hudi.

2020-04-22 Thread GitBox
vinothchandar commented on issue #1289: URL: https://github.com/apache/incubator-hudi/pull/1289#issuecomment-618015542 @prashantwason this will be a good candidate to fast track into the next release. are you still working on this?

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1540: [HUDI-819] Fix a bug with MergeOnReadLazyInsertIterable.

2020-04-22 Thread GitBox
vinothchandar commented on a change in pull request #1540: URL: https://github.com/apache/incubator-hudi/pull/1540#discussion_r413294812 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieHandleCreator.java ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
lamber-ken edited a comment on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-617976874 hi @harshi2506, based on the above analysis, please try to increate the upsert parallelism(`hoodie.upsert.shuffle.parallelism`) and spark executor instances, for

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1512: URL: https://github.com/apache/incubator-hudi/pull/1512#discussion_r413255248 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileFormat.java ## @@ -22,7 +22,7 @@ * Hoodie file format. */

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1512: URL: https://github.com/apache/incubator-hudi/pull/1512#discussion_r413254988 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java ## @@ -190,6 +190,9 @@ public

[GitHub] [incubator-hudi] lamber-ken commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
lamber-ken commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-617976874 hi @harshi2506, based on the above analysis, please try to increate the upsert parallelism and spark executor instances, for example ``` export

[GitHub] [incubator-hudi] lamber-ken commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
lamber-ken commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-617965381 From the detailed commit metadata and above spark ui, we know 1. write about 700 million records at first commit. 2. upsert 2633 records and touched 255

[incubator-hudi] branch master updated: [MINOR]: Fix cli docs for DeltaStreamer (#1547)

2020-04-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new 19cc15c [MINOR]: Fix cli docs for

[incubator-hudi] branch hudi_test_suite_refactor updated (8a0380c -> 388842c)

2020-04-22 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. omit 8a0380c Testing running 3 builds to limit total build time add 388842c Testing

[GitHub] [incubator-hudi] harshi2506 commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
harshi2506 commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-617906795 HI @lamber-ken , thanks for the response, attaching the screenshots https://user-images.githubusercontent.com/64137937/80011182-24ed4f80-84e9-11ea-8a44-938a8d352b6f.png;>

[jira] [Commented] (HUDI-773) Hudi On Azure Data Lake Storage V2

2020-04-22 Thread Sasikumar Venkatesh (Jira)
[ https://issues.apache.org/jira/browse/HUDI-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089852#comment-17089852 ] Sasikumar Venkatesh commented on HUDI-773: -- [~vinoth] and [~garyli1019] My Setup is As follows. 

[incubator-hudi] branch master updated (26684f5 -> aea7c16)

2020-04-22 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. from 26684f5 [HUDI-816] Fixed MAX_MEMORY_FOR_MERGE_PROP and MAX_MEMORY_FOR_COMPACTION_PROP do not work due to

[GitHub] [incubator-hudi] lamber-ken commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-04-22 Thread GitBox
lamber-ken commented on issue #1552: URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-617895350 Hi @harshi2506, please share the Spark stage UI, thanks This is an automated message from the Apache Git

[GitHub] [incubator-hudi] vinothchandar commented on issue #1548: [WIP] [HUDI-785] Refactor compaction/savepoint execution based on ActionExecutor abstraction

2020-04-22 Thread GitBox
vinothchandar commented on issue #1548: URL: https://github.com/apache/incubator-hudi/pull/1548#issuecomment-617887802 CI does seem to have passed.. This is an automated message from the Apache Git Service. To respond to

[jira] [Updated] (HUDI-810) Migrate HoodieClientTestHarness to JUnit 5

2020-04-22 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-810: Labels: pull-request-available (was: ) > Migrate HoodieClientTestHarness to JUnit 5 >

[GitHub] [incubator-hudi] xushiyan opened a new pull request #1553: [HUDI-810] Migrate ClientTestHarness to JUnit 5

2020-04-22 Thread GitBox
xushiyan opened a new pull request #1553: URL: https://github.com/apache/incubator-hudi/pull/1553 WIP ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ]

[incubator-hudi] branch asf-site updated: Travis CI build asf-site

2020-04-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 4254e60 Travis CI build asf-site

[incubator-hudi] branch asf-site updated: [HUDI-827] fix translation error (#1551)

2020-04-22 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 55dde43 [HUDI-827] fix translation

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r413014968 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexV2.java ## @@ -0,0 +1,321 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412995625 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -430,6 +430,14 @@ public boolean

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412992192 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexV2.java ## @@ -0,0 +1,321 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412988167 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HoodieTimer.java ## @@ -69,4 +76,13 @@ public long endTimer() { }

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
nsivabalan commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412955401 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -430,6 +430,14 @@ public boolean

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412965435 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexV2.java ## @@ -0,0 +1,321 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412965622 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -68,6 +68,13 @@ public static final String

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412965164 ## File path: hudi-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndexV2.java ## @@ -0,0 +1,321 @@ +/* + * Licensed to the

[GitHub] [incubator-hudi] lamber-ken commented on a change in pull request #1469: [HUDI-686] Implement BloomIndexV2 that does not depend on memory caching

2020-04-22 Thread GitBox
lamber-ken commented on a change in pull request #1469: URL: https://github.com/apache/incubator-hudi/pull/1469#discussion_r412946904 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HoodieTimer.java ## @@ -69,4 +76,13 @@ public long endTimer() { }

[jira] [Updated] (HUDI-827) Translation error

2020-04-22 Thread Lisheng Wang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Wang updated HUDI-827: -- Status: In Progress (was: Open) > Translation error > - > > Key:

[jira] [Updated] (HUDI-827) Translation error

2020-04-22 Thread Lisheng Wang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lisheng Wang updated HUDI-827: -- Status: Open (was: New) > Translation error > - > > Key: HUDI-827 >

[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1547: [MINOR]: Fix annotations in HoodieDeltaStreamer

2020-04-22 Thread GitBox
pratyakshsharma commented on issue #1547: URL: https://github.com/apache/incubator-hudi/pull/1547#issuecomment-617745980 LGTM. :) This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [incubator-hudi] dengziming commented on a change in pull request #1547: [MINOR]: Fix annotations in HoodieDeltaStreamer

2020-04-22 Thread GitBox
dengziming commented on a change in pull request #1547: URL: https://github.com/apache/incubator-hudi/pull/1547#discussion_r412932116 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java ## @@ -185,7 +185,7 @@ public

[GitHub] [incubator-hudi] lamber-ken commented on issue #1551: [HUDI-827] fix translation error

2020-04-22 Thread GitBox
lamber-ken commented on issue #1551: URL: https://github.com/apache/incubator-hudi/pull/1551#issuecomment-617744298 LGTM, let's wait @leesf make a final pass. This is an automated message from the Apache Git Service. To

[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-22 Thread GitBox
pratyakshsharma commented on issue #1538: URL: https://github.com/apache/incubator-hudi/pull/1538#issuecomment-617743201 > I need to wrap my head around some of this myself.. So please give me sometime to review.. Sure. :)

[GitHub] [incubator-hudi] pratyakshsharma commented on a change in pull request #1538: [HUDI-803]: added more test cases in TestHoodieAvroUtils.class

2020-04-22 Thread GitBox
pratyakshsharma commented on a change in pull request #1538: URL: https://github.com/apache/incubator-hudi/pull/1538#discussion_r412927375 ## File path: hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java ## @@ -60,7 +61,7 @@ private static ThreadLocal

[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1515: [HUDI-795] Ignoring missing aux folder

2020-04-22 Thread GitBox
pratyakshsharma commented on issue #1515: URL: https://github.com/apache/incubator-hudi/pull/1515#issuecomment-617739210 LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [incubator-hudi] pratyakshsharma commented on issue #1433: [HUDI-728]: Implement custom key generator

2020-04-22 Thread GitBox
pratyakshsharma commented on issue #1433: URL: https://github.com/apache/incubator-hudi/pull/1433#issuecomment-617737506 @vinothchandar Let us close this? :) This is an automated message from the Apache Git Service. To

[GitHub] [incubator-hudi] pratyakshsharma commented on issue #765: [WIP] Fix KafkaAvroSource to use the latest schema

2020-04-22 Thread GitBox
pratyakshsharma commented on issue #765: URL: https://github.com/apache/incubator-hudi/pull/765#issuecomment-617735742 > @pratyakshsharma Nope.. all yours if you want to take a run at it Sure, will be working on it next then :)

  1   2   >