[jira] [Updated] (HUDI-1026) Remove slf4j dependency from HoodieClientTestHarness

2020-08-06 Thread Cheshta Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheshta Sharma updated HUDI-1026: - Status: In Progress (was: Open) > Remove slf4j dependency from HoodieClientTestHarness >

[jira] [Updated] (HUDI-1026) Remove slf4j dependency from HoodieClientTestHarness

2020-08-06 Thread Cheshta Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheshta Sharma updated HUDI-1026: - Status: Open (was: New) > Remove slf4j dependency from HoodieClientTestHarness >

[jira] [Assigned] (HUDI-1026) Remove slf4j dependency from HoodieClientTestHarness

2020-08-06 Thread Cheshta Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheshta Sharma reassigned HUDI-1026: Assignee: Cheshta Sharma (was: Nishith Agarwal) > Remove slf4j dependency from HoodieClien

[GitHub] [hudi] vinothchandar closed pull request #1917: [WIP] Copy of PR 1752 to debug CI failure. Not for merging

2020-08-06 Thread GitBox
vinothchandar closed pull request #1917: URL: https://github.com/apache/hudi/pull/1917 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1869: URL: https://github.com/apache/hudi/pull/1869#issuecomment-670359122 @umehrot2 Once you are happy with this PR, we can merge this This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar commented on pull request #1886: [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca…

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1886: URL: https://github.com/apache/hudi/pull/1886#issuecomment-670358009 > I was wondering can we move this implement to hudi-client module just like the way all the implementations of metrics does. I think we can move this down the line. `h

[GitHub] [hudi] vinothchandar commented on pull request #1886: [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca…

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1886: URL: https://github.com/apache/hudi/pull/1886#issuecomment-670357054 As the RM sent out the note, we are only landing small PRs and release blockers, so we can keep master stable for cutting the 0.6.0 RC. Apologize for the inconvenience.

[GitHub] [hudi] vinothchandar merged pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
vinothchandar merged pull request #1924: URL: https://github.com/apache/hudi/pull/1924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] bvaradar commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
bvaradar commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670356269 Please check the version of parquet-hadoop This is an automated message from the Apache Git Service. To respond to the

[hudi] branch master updated: [HUDI-999] [RFC-12] Parallelize fetching of source data files/partitions (#1924)

2020-08-06 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new ab453f2 [HUDI-999] [RFC-12] Parallelize fetching

[GitHub] [hudi] vinothchandar commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466850903 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders w

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-670354729 @garyli1019 I wrote a test for this. Seems like this is actually not a problem. So reverted the unset for now. Please check my last commit. now, if CI passes this timee

[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-06 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-670341009 @mingujotemp : I just noticed you are using hive 3.x. I have not seen similar issues with Hive 2.x. Can you enable debug logging to see if your spark sql query triggers HoodiParquetInp

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1900: [HUDI-531]Add java doc for hudi test suite general classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1900: URL: https://github.com/apache/hudi/pull/1900#discussion_r466834745 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java ## @@ -48,6 +51,11 @@ public DagSchedule

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1900: [HUDI-531]Add java doc for hudi test suite general classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1900: URL: https://github.com/apache/hudi/pull/1900#discussion_r466834370 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/DagNode.java ## @@ -76,6 +76,12 @@ public void setParentNodes(

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1900: [HUDI-531]Add java doc for hudi test suite general classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1900: URL: https://github.com/apache/hudi/pull/1900#discussion_r466834278 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/DagNode.java ## @@ -76,6 +76,12 @@ public void setParentNodes(

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466833643 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/TestDFSHoodieTestSuiteWriterAdapter.java ## @@ -52,6 +52,9 @@ import or

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466833116 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/utils/TestUtils.java ## @@ -45,6 +48,15 @@ return dataGenerator.gen

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832837 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/utils/TestUtils.java ## @@ -28,6 +28,9 @@ import org.apache.spark.api.j

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832582 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/HiveSyncDagGenerator.java ## @@ -31,6 +31,9 @@ import org.apache.hu

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832642 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/HiveSyncDagGeneratorMOR.java ## @@ -31,6 +31,9 @@ import org.apache

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832530 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/ComplexDagGenerator.java ## @@ -33,6 +33,9 @@ import org.apache.hud

[GitHub] [hudi] cheshta2904 commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
cheshta2904 commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466831488 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java ## @@ -48,6 +48,9 @@ import static org.junit.jupiter.api.Assertions.as

[GitHub] [hudi] cheshta2904 commented on pull request #1927: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
cheshta2904 commented on pull request #1927: URL: https://github.com/apache/hudi/pull/1927#issuecomment-670335359 Please fix the build. This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r466829856 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -132,11 +132,15 @@ class DefaultSource extends RelationProvider l

[GitHub] [hudi] vinothchandar commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r466828072 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -132,11 +132,15 @@ class DefaultSource extends RelationProvider

[jira] [Assigned] (HUDI-1154) Hive Sync Partition Extractor not handling decimal types properly

2020-08-06 Thread linshan-ma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] linshan-ma reassigned HUDI-1154: Assignee: linshan-ma > Hive Sync Partition Extractor not handling decimal types properly >

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #362

2020-08-06 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.60 KB...] cdi-api-1.0.jar cdi-api.license commons-cli-1.4.jar commons-cli.license commons-io-2.5.jar commons-io.license commons-lang3-3.5.jar

[GitHub] [hudi] nsivabalan commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-06 Thread GitBox
nsivabalan commented on pull request #1912: URL: https://github.com/apache/hudi/pull/1912#issuecomment-670295868 Synced up with @bvaradar on the diff. Here are some changes/conclusions we narrowed down after our discussion. - We felt exposing TimedWaitOnAppearCG to external users may n

[GitHub] [hudi] Mathieu1124 edited a comment on pull request #1886: [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca…

2020-08-06 Thread GitBox
Mathieu1124 edited a comment on pull request #1886: URL: https://github.com/apache/hudi/pull/1886#issuecomment-670294044 @yanghua VC seems busy, do you have any other concern about this pr ? if it is ok, can we merge this first, and file a new pr if VC agree to move this to hudi-client

[GitHub] [hudi] Mathieu1124 commented on pull request #1886: [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca…

2020-08-06 Thread GitBox
Mathieu1124 commented on pull request #1886: URL: https://github.com/apache/hudi/pull/1886#issuecomment-670294044 @yanghua VC seems busy, do you have any other concern about this pr ? if it is ok, can we merge this first, and file a new pr if VC agree to move this to hudi-client :) -

[GitHub] [hudi] linshan-ma opened a new pull request #1927: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
linshan-ma opened a new pull request #1927: URL: https://github.com/apache/hudi/pull/1927 ## *Tips* - *Remove unused dependencies from HoodieDeltaStreamerWrapper Class* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)*

[GitHub] [hudi] linshan-ma closed pull request #1926: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
linshan-ma closed pull request #1926: URL: https://github.com/apache/hudi/pull/1926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Updated] (HUDI-1156) Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1156: - Labels: pull-request-available (was: ) > Remove unused dependencies from HoodieDeltaStreamerWrapp

[GitHub] [hudi] linshan-ma opened a new pull request #1926: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
linshan-ma opened a new pull request #1926: URL: https://github.com/apache/hudi/pull/1926 ## *Tips* - *Remove unused dependencies from HoodieDeltaStreamerWrapper Class.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document

[jira] [Assigned] (HUDI-1156) Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread linshan-ma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] linshan-ma reassigned HUDI-1156: Assignee: linshan-ma > Remove unused dependencies from HoodieDeltaStreamerWrapper Class > -

[GitHub] [hudi] umehrot2 commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466779443 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders with f

[GitHub] [hudi] umehrot2 commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466779443 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders with f

[jira] [Assigned] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1158: --- Assignee: Udit Mehrotra > Optimizations in parallelized listing behaviour for markers and boo

[jira] [Assigned] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1158: --- Assignee: (was: Udit Mehrotra) > Optimizations in parallelized listing behaviour for mark

[jira] [Created] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2020-08-06 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1158: --- Summary: Optimizations in parallelized listing behaviour for markers and bootstrap source files Key: HUDI-1158 URL: https://issues.apache.org/jira/browse/HUDI-1158 Proj

[GitHub] [hudi] mingujotemp commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-06 Thread GitBox
mingujotemp commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-670277613 yup that has been set for sure This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] umehrot2 commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466774810 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders with f

[GitHub] [hudi] bvaradar commented on issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-06 Thread GitBox
bvaradar commented on issue #1925: URL: https://github.com/apache/hudi/issues/1925#issuecomment-670260480 For password based BASIC authentication allows passing username and password like this :http://username:passw...@example.com/"; -- this sends the credentials in the standard HTTP "Auth

[GitHub] [hudi] zhedoubushishi commented on pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
zhedoubushishi commented on pull request #1870: URL: https://github.com/apache/hudi/pull/1870#issuecomment-670256477 > @zhedoubushishi there is one issue here. we are changing what goes into the cleaner plan i.e its writing full path as opposed to just the file names. > > This means

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466748892 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRDD.scala ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466748555 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider val par

[jira] [Created] (HUDI-1157) Optimization whether to query Bootstrapped table using HoodieBootstrapRelation vs Sparks Parquet datasource

2020-08-06 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1157: --- Summary: Optimization whether to query Bootstrapped table using HoodieBootstrapRelation vs Sparks Parquet datasource Key: HUDI-1157 URL: https://issues.apache.org/jira/browse/HUDI-1

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1870: URL: https://github.com/apache/hudi/pull/1870#discussion_r466742409 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java ## @@ -513,4 +522,41 @@ public static void writeRecords

[jira] [Assigned] (HUDI-1108) Allow parallel listing of dataset partitions for various actions during write

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1108: --- Assignee: Ryan Pifer (was: Udit Mehrotra) > Allow parallel listing of dataset partitions for

[jira] [Assigned] (HUDI-1108) Allow parallel listing of dataset partitions for various actions during write

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1108: --- Assignee: Udit Mehrotra > Allow parallel listing of dataset partitions for various actions du

[jira] [Assigned] (HUDI-1108) Allow parallel listing of dataset partitions for various actions during write

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1108: --- Assignee: (was: Udit Mehrotra) > Allow parallel listing of dataset partitions for various

[GitHub] [hudi] luffyd commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
luffyd commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670211630 From Spark ENV tab, parquet version seems to be this /mnt2/yarn/usercache/hadoop/appcache/application_1596743154329_0001/container_1596743154329_0001_01_01/__spark_libs__/parquet-f

[GitHub] [hudi] jpugliesi opened a new issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-06 Thread GitBox
jpugliesi opened a new issue #1925: URL: https://github.com/apache/hudi/issues/1925 **Describe the problem you faced** [Sharing here as requested on Slack](https://apache-hudi.slack.com/archives/C4D716NPQ/p1596675249254300) I would like to configure a DeltaStreamer `SchemaRegistry

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466696366 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private HoodieBootst

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r48582 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext: SQL

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r4 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRDD.scala ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r47402 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider va

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466589747 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466589474 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private HoodieBootst

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-670078060 > ``` > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 66.614 s <<< FAILURE! - in org.apache.hudi.functional.TestCOWDataSource > [ERROR] org.apach

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466572098 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466562083 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] luffyd commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
luffyd commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670039732 Thanks for the input @bvaradar "Too many open files on IOException" issue also seems to be co-related with having 2G as max file limit. Will confirm the parquet version. Reg

[GitHub] [hudi] bschell closed pull request #1922: [HUDI-1152] Add option to skip syncing Hudi metadata columns

2020-08-06 Thread GitBox
bschell closed pull request #1922: URL: https://github.com/apache/hudi/pull/1922 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bschell commented on pull request #1922: [HUDI-1152] Add option to skip syncing Hudi metadata columns

2020-08-06 Thread GitBox
bschell commented on pull request #1922: URL: https://github.com/apache/hudi/pull/1922#issuecomment-670038375 @vinothchandar thanks for the detailed explanation! When I was considering this feature I was only considering the feedback that the extra columns were confusing for end users who

[jira] [Closed] (HUDI-1151) Fix NPE when no new data in kafka using HoodieDeltaStreamer

2020-08-06 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1151. -- Resolution: Fixed Fixed via master branch: b51646dcc76acc68e97dd6a67cc7557e362b590d > Fix NPE when no new data

[GitHub] [hudi] yanghua merged pull request #1921: [HUDI-1151]Fix NPE when no new data in kafka using HoodieDeltaStreamer

2020-08-06 Thread GitBox
yanghua merged pull request #1921: URL: https://github.com/apache/hudi/pull/1921 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[hudi] branch master updated (51ea27d -> b51646d)

2020-08-06 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 51ea27d [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810) add b51646d [

[GitHub] [hudi] bvaradar commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
bvaradar commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670015540 @luffyd : I spent some time trying to understand your use-case. To your question : Hudi needs to list partitions in-order to figure out the list of valid files that constitute l

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-669960354 ``` [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 66.614 s <<< FAILURE! - in org.apache.hudi.functional.TestCOWDataSource [ERROR] org.apache.hu

[GitHub] [hudi] nsivabalan commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-06 Thread GitBox
nsivabalan commented on a change in pull request #1834: URL: https://github.com/apache/hudi/pull/1834#discussion_r466385801 ## File path: hudi-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -1,88 +0,0 @@ -/* - * Licensed to the Apache Software Found

[GitHub] [hudi] nsivabalan commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-06 Thread GitBox
nsivabalan commented on a change in pull request #1834: URL: https://github.com/apache/hudi/pull/1834#discussion_r466385455 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieRowCreateHandle.java ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466398471 ## File path: hudi-common/src/main/java/org/apache/hudi/common/metrics/Registry.java ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466397854 ## File path: hudi-common/src/main/java/org/apache/hudi/common/metrics/Registry.java ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466396853 ## File path: hudi-common/src/main/java/org/apache/hudi/common/metrics/Counter.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466395899 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java ## @@ -64,10 +65,15 @@ public static final String HOODIE_S

[GitHub] [hudi] Ares-W commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
Ares-W commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-669866341 Maybe https://issues.apache.org/jira/browse/PARQUET-783 cause this exception. This is an automated message from the Apach

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466320096 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -54,29 +58,54 @@ class DefaultSource extends RelationProvider val par

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466317612 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider val par

[GitHub] [hudi] s-sanjay commented on issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS

2020-08-06 Thread GitBox
s-sanjay commented on issue #1895: URL: https://github.com/apache/hudi/issues/1895#issuecomment-669845428 Right now presto does not support reading TIMESTAMP_MICROS type. This needs to be fixed from the presto side for which I am working on a fix. ( presto only supports timestamp upto mill

[jira] [Updated] (HUDI-1151) Fix NPE when no new data in kafka using HoodieDeltaStreamer

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1151: -- Status: Open (was: New) > Fix NPE when no new data in kafka using HoodieDeltaStreamer > ---

[jira] [Updated] (HUDI-1078) Fix IllegalArgumentException in Delete data demo of Quick-Start Guide

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1078: -- Status: Open (was: New) > Fix IllegalArgumentException in Delete data demo of Quick-Start Guide > -

[jira] [Resolved] (HUDI-1078) Fix IllegalArgumentException in Delete data demo of Quick-Start Guide

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu resolved HUDI-1078. --- Resolution: Fixed > Fix IllegalArgumentException in Delete data demo of Quick-Start Guide > --

[jira] [Commented] (HUDI-1078) Fix IllegalArgumentException in Delete data demo of Quick-Start Guide

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172212#comment-17172212 ] wangxianghu commented on HUDI-1078: --- done via: 10e457278bee529d14e445012fa61e875e3f77cd

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466311768 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRDD.scala ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466310309 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext: SQLConte

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466310024 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext: SQLConte

[jira] [Created] (HUDI-1156) Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread Cheshta Sharma (Jira)
Cheshta Sharma created HUDI-1156: Summary: Remove unused dependencies from HoodieDeltaStreamerWrapper Class Key: HUDI-1156 URL: https://issues.apache.org/jira/browse/HUDI-1156 Project: Apache Hudi

[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1858: URL: https://github.com/apache/hudi/pull/1858#discussion_r464714682 ## File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -186,10 +188,14 @@ public HoodieMetrics getMetrics

[GitHub] [hudi] vinothchandar commented on a change in pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1870: URL: https://github.com/apache/hudi/pull/1870#discussion_r466243056 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java ## @@ -513,4 +522,41 @@ public static void writeRecordsT

[GitHub] [hudi] vinothchandar commented on pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1870: URL: https://github.com/apache/hudi/pull/1870#issuecomment-669803111 cc @bvaradar as well This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [hudi] umehrot2 commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466248630 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private HoodieBootstrapInd

[GitHub] [hudi] hddong commented on pull request #1242: [HUDI-544] Archived commits command code cleanup

2020-08-06 Thread GitBox
hddong commented on pull request #1242: URL: https://github.com/apache/hudi/pull/1242#issuecomment-669800409 @n3nash : had rebase this, please have a review when free. This is an automated message from the Apache Git Service.

[GitHub] [hudi] umehrot2 commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466117695 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [hudi] vinothchandar commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466240100 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private HoodieBootstr

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-669787801 @garyli1019 I am afraid this has something to do with the changes we for `InMemoryFileIndex` or sth made in the pr . ``` TestBootstrap : files:[file:

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466223251 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider va

[GitHub] [hudi] vinothchandar commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466217168 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders w

[GitHub] [hudi] vinothchandar commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466216468 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders w

  1   2   >