[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r465333514 ## File path: hudi-spark/src/main/java/org/apache/hudi/async/SparkStreamingWriterActivityDetector.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r46572 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala ## @@ -38,46 +50,65 @@ class HoodieStreamingSink(sqlContext: SQLContext,

[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r465332412 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala ## @@ -111,12 +143,64 @@ class HoodieStreamingSink(sqlContext:

[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r465332053 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala ## @@ -111,12 +143,64 @@ class HoodieStreamingSink(sqlContext:

[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r465331708 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java ## @@ -58,6 +58,7 @@ protected static final String PRESTO_COORDINATOR

[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r465331113 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieStreamingSink.scala ## @@ -38,46 +50,65 @@ class HoodieStreamingSink(sqlContext: SQLContext,

[GitHub] [hudi] bvaradar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r465330932 ## File path: hudi-spark/src/test/java/HoodieJavaStreamingApp.java ## @@ -68,7 +74,7 @@ private String tableName = "hoodie_test"; @Parameter(names

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-04 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-668823914 @garyli1019 I cannot reproduce the bootstrap test failure locally. Ran like 25 times :/ This is an

[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

2020-08-04 Thread GitBox
vinothchandar commented on pull request #1858: URL: https://github.com/apache/hudi/pull/1858#issuecomment-668816966 @nsivabalan I already did that as well. Will review and push some changes to this PR This is an automated

[GitHub] [hudi] vinothchandar commented on a change in pull request #1810: [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync

2020-08-04 Thread GitBox
vinothchandar commented on a change in pull request #1810: URL: https://github.com/apache/hudi/pull/1810#discussion_r465228726 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -261,6 +268,44 @@ private[hudi] object HoodieSparkSqlWriter {

[GitHub] [hudi] vinothchandar commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-04 Thread GitBox
vinothchandar commented on pull request #1912: URL: https://github.com/apache/hudi/pull/1912#issuecomment-668708393 @bvaradar you are most familiar with this. can you please review this? @umehrot2 let's see if we agree on the principles here

[GitHub] [hudi] bvaradar commented on a change in pull request #1898: [HUDI-1140] Fix Jcommander issue for --hoodie-conf in DeltaStreamer

2020-08-04 Thread GitBox
bvaradar commented on a change in pull request #1898: URL: https://github.com/apache/hudi/pull/1898#discussion_r465186323 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java ## @@ -165,6 +168,12 @@ public Operation

[GitHub] [hudi] lw309637554 commented on a change in pull request #1810: [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync

2020-08-04 Thread GitBox
lw309637554 commented on a change in pull request #1810: URL: https://github.com/apache/hudi/pull/1810#discussion_r465133092 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java ## @@ -267,9 +267,16 @@ public Operation

[GitHub] [hudi] luffyd commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-04 Thread GitBox
luffyd commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-668679565 Spark side configuration questions - I have been using client mode, does it makes a difference using in cluster mode?

[GitHub] [hudi] luffyd opened a new issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-04 Thread GitBox
luffyd opened a new issue #1913: URL: https://github.com/apache/hudi/issues/1913 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? yes - Join the mailing list to engage in conversations and get faster

[GitHub] [hudi] nsivabalan commented on a change in pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

2020-08-04 Thread GitBox
nsivabalan commented on a change in pull request #1868: URL: https://github.com/apache/hudi/pull/1868#discussion_r465152007 ## File path: hudi-client/src/test/java/org/apache/hudi/table/action/commit/TestUpsertPartitioner.java ## @@ -252,8 +250,27 @@ public void

[GitHub] [hudi] nsivabalan commented on a change in pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-04 Thread GitBox
nsivabalan commented on a change in pull request #1912: URL: https://github.com/apache/hudi/pull/1912#discussion_r465142508 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/TimedWaitOnAppearConsistencyGaurd.java ## @@ -0,0 +1,90 @@ +/* + * Licensed to the

[GitHub] [hudi] bvaradar closed issue #1872: [SUPPORT]Getting 503s from S3 during upserts

2020-08-04 Thread GitBox
bvaradar closed issue #1872: URL: https://github.com/apache/hudi/issues/1872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] luffyd commented on issue #1866: [SUPPORT]Clean up does not seem to happen on MOR table

2020-08-04 Thread GitBox
luffyd commented on issue #1866: URL: https://github.com/apache/hudi/issues/1866#issuecomment-668667895 Please resolve this, Cleans are happening fine. I also added, I think it comes at the expense of timeline feature. We will relax it later ` val compactionConfig =

[GitHub] [hudi] bvaradar commented on issue #1860: [SUPPORT] Issue when querying from Spark Datasource if COW table is being written to at the same time

2020-08-04 Thread GitBox
bvaradar commented on issue #1860: URL: https://github.com/apache/hudi/issues/1860#issuecomment-668667634 @stackfun : Were you able to figure this out ? This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] bvaradar closed issue #1852: [SUPPORT]

2020-08-04 Thread GitBox
bvaradar closed issue #1852: URL: https://github.com/apache/hudi/issues/1852 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-08-04 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-668667100 Closing this ticket as it was answered. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] bvaradar closed issue #1846: [SUPPORT] HoodieSnapshotCopier example

2020-08-04 Thread GitBox
bvaradar closed issue #1846: URL: https://github.com/apache/hudi/issues/1846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] luffyd commented on issue #1872: [SUPPORT]Getting 503s from S3 during upserts

2020-08-04 Thread GitBox
luffyd commented on issue #1872: URL: https://github.com/apache/hudi/issues/1872#issuecomment-668666129 Not seeing right now, after me adding these configuration in the emr cluster configurations. But not sure, it is because of transient nature or the issue is really solved! Will

[GitHub] [hudi] bvaradar commented on issue #1843: [SUPPORT] Hudi cli errors when doing 'stats filesizes' on hoodie 0.4.6 data

2020-08-04 Thread GitBox
bvaradar commented on issue #1843: URL: https://github.com/apache/hudi/issues/1843#issuecomment-668665986 If you are still seeing the problem in 0.5.x, please reopen the ticket with more details This is an automated message

[GitHub] [hudi] bvaradar closed issue #1843: [SUPPORT] Hudi cli errors when doing 'stats filesizes' on hoodie 0.4.6 data

2020-08-04 Thread GitBox
bvaradar closed issue #1843: URL: https://github.com/apache/hudi/issues/1843 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar closed issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

2020-08-04 Thread GitBox
bvaradar closed issue #1839: URL: https://github.com/apache/hudi/issues/1839 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

2020-08-04 Thread GitBox
bvaradar commented on issue #1839: URL: https://github.com/apache/hudi/issues/1839#issuecomment-668665466 Closing this ticket in favor of jira to track the feature request This is an automated message from the Apache Git

[GitHub] [hudi] bvaradar commented on issue #1828: [SUPPORT] Cannot force hudi to retain only last commit

2020-08-04 Thread GitBox
bvaradar commented on issue #1828: URL: https://github.com/apache/hudi/issues/1828#issuecomment-668664859 @kirkuz : Kindly reach out to AWS support directly. I am closing this ticket for now. This is an automated message

[GitHub] [hudi] bvaradar closed issue #1828: [SUPPORT] Cannot force hudi to retain only last commit

2020-08-04 Thread GitBox
bvaradar closed issue #1828: URL: https://github.com/apache/hudi/issues/1828 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar closed issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-08-04 Thread GitBox
bvaradar closed issue #1825: URL: https://github.com/apache/hudi/issues/1825 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1811: Deltastreamer Offset exception -Prod

2020-08-04 Thread GitBox
bvaradar commented on issue #1811: URL: https://github.com/apache/hudi/issues/1811#issuecomment-668662589 Please reopen if the issue persists. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] bvaradar closed issue #1811: Deltastreamer Offset exception -Prod

2020-08-04 Thread GitBox
bvaradar closed issue #1811: URL: https://github.com/apache/hudi/issues/1811 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar closed issue #1800: [SUPPORT] finalize errors "at org.apache.hudi.table.HoodieTable.cleanFailedWrites"

2020-08-04 Thread GitBox
bvaradar closed issue #1800: URL: https://github.com/apache/hudi/issues/1800 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar closed issue #1798: Question reading partition path with less level is more faster than what document mentioned

2020-08-04 Thread GitBox
bvaradar closed issue #1798: URL: https://github.com/apache/hudi/issues/1798 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1800: [SUPPORT] finalize errors "at org.apache.hudi.table.HoodieTable.cleanFailedWrites"

2020-08-04 Thread GitBox
bvaradar commented on issue #1800: URL: https://github.com/apache/hudi/issues/1800#issuecomment-668662226 Closing this due to inactivity. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] bvaradar commented on issue #1798: Question reading partition path with less level is more faster than what document mentioned

2020-08-04 Thread GitBox
bvaradar commented on issue #1798: URL: https://github.com/apache/hudi/issues/1798#issuecomment-668661892 https://issues.apache.org/jira/browse/HUDI-1144 to address optimizaion in HoodieROPathFilter This is an automated

[GitHub] [hudi] bvaradar commented on issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

2020-08-04 Thread GitBox
bvaradar commented on issue #1791: URL: https://github.com/apache/hudi/issues/1791#issuecomment-668660685 closing this ticket as we have jira to track This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] bvaradar closed issue #1791: [SUPPORT] Does DeltaStreamer support listening to multiple kafka topics and upserting to multiple tables?

2020-08-04 Thread GitBox
bvaradar closed issue #1791: URL: https://github.com/apache/hudi/issues/1791 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1787: Exception During Insert

2020-08-04 Thread GitBox
bvaradar commented on issue #1787: URL: https://github.com/apache/hudi/issues/1787#issuecomment-668660351 Closing this issue. Please reopen if needed. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] bvaradar closed issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-08-04 Thread GitBox
bvaradar closed issue #1766: URL: https://github.com/apache/hudi/issues/1766 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar closed issue #1787: Exception During Insert

2020-08-04 Thread GitBox
bvaradar closed issue #1787: URL: https://github.com/apache/hudi/issues/1787 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-08-04 Thread GitBox
bvaradar commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-668660075 Closing this ticket as we have the jira This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] bvaradar commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-08-04 Thread GitBox
bvaradar commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-668659435 @RajasekarSribalan : Can you turn on debug level logs and see if hudi input format is activated when reading the data.

[GitHub] [hudi] bvaradar closed issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-08-04 Thread GitBox
bvaradar closed issue #1705: URL: https://github.com/apache/hudi/issues/1705 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-08-04 Thread GitBox
bvaradar commented on issue #1705: URL: https://github.com/apache/hudi/issues/1705#issuecomment-668657866 Closing this and we can track through jira. @nandini57 : Please grab the jira if you are planning to work on it. This

[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

2020-08-04 Thread GitBox
bvaradar commented on issue #1586: URL: https://github.com/apache/hudi/issues/1586#issuecomment-668657095 https://github.com/apache/hudi/pull/1898 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

2020-08-04 Thread GitBox
bvaradar commented on issue #1586: URL: https://github.com/apache/hudi/issues/1586#issuecomment-668656807 Closing this as we have a PR. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] bvaradar closed issue #1586: [SUPPORT] DMS with 2 key example

2020-08-04 Thread GitBox
bvaradar closed issue #1586: URL: https://github.com/apache/hudi/issues/1586 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on issue #1911: [SUPPORT] GLOBAL_BLOOM index errors on Upsert operation

2020-08-04 Thread GitBox
bvaradar commented on issue #1911: URL: https://github.com/apache/hudi/issues/1911#issuecomment-668655133 @nsivabalan : Can you take a look at this ? This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-04 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-668651417 This looks like you are not using hudi format to read the table. Did you try spark.read.format("hudi"). ? This

[GitHub] [hudi] bvaradar commented on issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

2020-08-04 Thread GitBox
bvaradar commented on issue #1909: URL: https://github.com/apache/hudi/issues/1909#issuecomment-668649154 ```20/08/04 07:11:50 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist Traceback (most recent call last): File "", line 5, in File

[GitHub] [hudi] bvaradar commented on issue #1906: [SUPPORT] org.apache.hudi.common.util.FSUtils logging too verbose

2020-08-04 Thread GitBox
bvaradar commented on issue #1906: URL: https://github.com/apache/hudi/issues/1906#issuecomment-668644790 Added https://issues.apache.org/jira/browse/HUDI-1148 This is an automated message from the Apache Git Service. To

[GitHub] [hudi] bvaradar closed issue #1906: [SUPPORT] org.apache.hudi.common.util.FSUtils logging too verbose

2020-08-04 Thread GitBox
bvaradar closed issue #1906: URL: https://github.com/apache/hudi/issues/1906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2020-08-04 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1148: - Status: Open (was: New) > Revisit log messages seen when wiriting or reading through

[jira] [Created] (HUDI-1148) Revisit log messages seen when wiriting or reading through Hudi

2020-08-04 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1148: Summary: Revisit log messages seen when wiriting or reading through Hudi Key: HUDI-1148 URL: https://issues.apache.org/jira/browse/HUDI-1148 Project: Apache

[GitHub] [hudi] nsivabalan commented on a change in pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-04 Thread GitBox
nsivabalan commented on a change in pull request #1912: URL: https://github.com/apache/hudi/pull/1912#discussion_r465039422 ## File path: hudi-client/src/main/java/org/apache/hudi/table/HoodieTable.java ## @@ -505,16 +507,26 @@ private boolean waitForCondition(String

[jira] [Commented] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-08-04 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170766#comment-17170766 ] sivabalan narayanan commented on HUDI-1098: --- [https://github.com/apache/hudi/pull/1912]   >

[jira] [Updated] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-08-04 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1098: - Labels: pull-request-available (was: ) > Marker file finalizing may block on a data file that

[GitHub] [hudi] nsivabalan commented on a change in pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-04 Thread GitBox
nsivabalan commented on a change in pull request #1912: URL: https://github.com/apache/hudi/pull/1912#discussion_r465008616 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/ConsistencyGuardConfig.java ## @@ -36,15 +36,15 @@ // time between successive

[GitHub] [hudi] shenh062326 edited a comment on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-08-04 Thread GitBox
shenh062326 edited a comment on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-668542359 > makes sense. sorry about the oversight. I can rework it. Let me confirm how to do it first, adding PayloadConfig, and then calling PayloadConfig in

[GitHub] [hudi] nsivabalan opened a new pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-04 Thread GitBox
nsivabalan opened a new pull request #1912: URL: https://github.com/apache/hudi/pull/1912 ## What is the purpose of the pull request Introducing a TimedWaitOnAppearConsistencyGuard for eventual consistent stores. This will sleep for configured period of time only on APPEAR. It is a

[GitHub] [hudi] shenh062326 commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-08-04 Thread GitBox
shenh062326 commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-668542359 > makes sense. sorry about the oversight. I can rework it. Let me confirm the how to do it first, adding PayloadConfig, and then calling PayloadConfig in

[GitHub] [hudi] nsivabalan commented on pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

2020-08-04 Thread GitBox
nsivabalan commented on pull request #1868: URL: https://github.com/apache/hudi/pull/1868#issuecomment-668538053 LMK once the PR is ready to be reviewed (once you address the one comment about assertions) This is an

[GitHub] [hudi] nsivabalan commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

2020-08-04 Thread GitBox
nsivabalan commented on pull request #1858: URL: https://github.com/apache/hudi/pull/1858#issuecomment-668535467 sure. I was planning to do it this mrng. anyways. LMK if you want me to fix the commit msg or you gonna take care of it as well.

[GitHub] [hudi] nsivabalan commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-08-04 Thread GitBox
nsivabalan commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-668534087 makes sense. sorry about the oversight. I will take up the rework by using a payloadConfig class. This is

[GitHub] [hudi] nsivabalan commented on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-08-04 Thread GitBox
nsivabalan commented on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-668532860 https://github.com/apache/hudi/pull/1793 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] mingujotemp opened a new issue #1911: [SUPPORT] GLOBAL_BLOOM index errors on Upsert operation

2020-08-04 Thread GitBox
mingujotemp opened a new issue #1911: URL: https://github.com/apache/hudi/issues/1911 **Describe the problem you faced** Using GLOBAL_BLOOM index errors on Upsert operation `org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0`

[GitHub] [hudi] mingujotemp opened a new issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-04 Thread GitBox
mingujotemp opened a new issue #1910: URL: https://github.com/apache/hudi/issues/1910 **Describe the problem you faced** Upsert operation duplicates records in a partition. We use EMR 6.0.0 (Hudi 0.5.0) **To Reproduce** Steps to reproduce the behavior: 1.

[GitHub] [hudi] mingujotemp commented on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-08-04 Thread GitBox
mingujotemp commented on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-668432895 Can you link a pr for this issue? @vinothchandar This is an automated message from the Apache Git Service. To

[GitHub] [hudi] mingujotemp opened a new issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

2020-08-04 Thread GitBox
mingujotemp opened a new issue #1909: URL: https://github.com/apache/hudi/issues/1909 **Describe the problem you faced** HUDI 0.5.0 (using on EMR) I encounter `org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20200804071144` when

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-04 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r464842847 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-04 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-668416701 @garyli1019 changes are isolated enough. Please follow up with a fix for the issue I pointed out. Once CI passes, I ll land this PR for now.

[GitHub] [hudi] vinothchandar commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-04 Thread GitBox
vinothchandar commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r464834109 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -0,0 +1,274 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] Ares-W commented on issue #1908: [SUPPORT]There are too many close_wait sockets on 50010 port when I use SparkStreaming to save kafka data to Hudi.

2020-08-04 Thread GitBox
Ares-W commented on issue #1908: URL: https://github.com/apache/hudi/issues/1908#issuecomment-668410651 parquet-1.9.0 cause this problem https://issues.apache.org/jira/browse/PARQUET-783 This is an automated message from

[GitHub] [hudi] Ares-W closed issue #1908: [SUPPORT]There are too many close_wait sockets on 50010 port when I use SparkStreaming to save kafka data to Hudi.

2020-08-04 Thread GitBox
Ares-W closed issue #1908: URL: https://github.com/apache/hudi/issues/1908 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Created] (HUDI-1147) Generate valid timestamp and partition for data generator

2020-08-04 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1147: - Summary: Generate valid timestamp and partition for data generator Key: HUDI-1147 URL: https://issues.apache.org/jira/browse/HUDI-1147 Project: Apache Hudi

[GitHub] [hudi] vinothchandar commented on pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

2020-08-04 Thread GitBox
vinothchandar commented on pull request #1858: URL: https://github.com/apache/hudi/pull/1858#issuecomment-668403215 @nsivabalan I rebased this against master This is an automated message from the Apache Git Service. To

[GitHub] [hudi] nagacse commented on pull request #1907: Deltastreamer on Databricks-6.6 version

2020-08-04 Thread GitBox
nagacse commented on pull request #1907: URL: https://github.com/apache/hudi/pull/1907#issuecomment-668402026 closing in favour of https://github.com/apache/hudi/commit/539621bd33893d99a07b8f739a1e965ca72acdc9 This is an

[GitHub] [hudi] nagacse closed pull request #1907: Deltastreamer on Databricks-6.6 version

2020-08-04 Thread GitBox
nagacse closed pull request #1907: URL: https://github.com/apache/hudi/pull/1907 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-04 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-668398894 > to be clear, you are saying it should all be working correct? assuming you may not have conflicts with #1807 , can you please rebase this off latest masteR? Yes the

<    1   2