[GitHub] [hudi] teeyog commented on pull request #2431: [HUDI-1526] Translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-02-09 Thread GitBox
teeyog commented on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-776512247 @nsivabalan Modified according to your opinion, please review again, thanks This is an automated message from the Ap

[GitHub] [hudi] yanghua commented on pull request #2548: [HUDI-1597] remove deprecated spring repos from pom

2021-02-09 Thread GitBox
yanghua commented on pull request #2548: URL: https://github.com/apache/hudi/pull/2548#issuecomment-776452167 > > What I mean is that whether we need to freeze the more changes via releasing a minor branch. > > definitely the cleaner approach. but not sure if the time is worth inves

[jira] [Commented] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282214#comment-17282214 ] Vinoth Chandar commented on HUDI-1602: -- cc [~garyli] in case you have seen this stack

[jira] [Commented] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282213#comment-17282213 ] Vinoth Chandar commented on HUDI-1602: -- But does the stack trace still show `MergeOnR

[jira] [Commented] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Alexander Filipchik (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282206#comment-17282206 ] Alexander Filipchik commented on HUDI-1602: --- Added  {code:java} hoodie.datasourc

[GitHub] [hudi] vinothchandar commented on a change in pull request #2540: [HUDI-1586] [Common Core] [Flink Integration] Reduce the coupling of hadoop.

2021-02-09 Thread GitBox
vinothchandar commented on a change in pull request #2540: URL: https://github.com/apache/hudi/pull/2540#discussion_r573426169 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java ## @@ -80,9 +80,6 @@ private static final PathFilter ALLOW_ALL_FILTER

[GitHub] [hudi] jingweiz2017 commented on pull request #2192: [HUDI-1343] Add standard schema postprocessor which would rewrite the schema using spark-avro conversion

2021-02-09 Thread GitBox
jingweiz2017 commented on pull request #2192: URL: https://github.com/apache/hudi/pull/2192#issuecomment-776413023 @yanghua Filed [HUDI-1607 ](https://issues.apache.org/jira/projects/HUDI/issues/HUDI-1607?filter=allissues) T

[jira] [Created] (HUDI-1607) Decimal handling bug in SparkAvroPostProcessor

2021-02-09 Thread Jingwei Zhang (Jira)
Jingwei Zhang created HUDI-1607: --- Summary: Decimal handling bug in SparkAvroPostProcessor Key: HUDI-1607 URL: https://issues.apache.org/jira/browse/HUDI-1607 Project: Apache Hudi Issue Type: B

[GitHub] [hudi] ZhangChaoming commented on a change in pull request #2540: [HUDI-1586] [Common Core] [Flink Integration] Reduce the coupling of hadoop.

2021-02-09 Thread GitBox
ZhangChaoming commented on a change in pull request #2540: URL: https://github.com/apache/hudi/pull/2540#discussion_r573396780 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java ## @@ -80,9 +80,6 @@ private static final PathFilter ALLOW_ALL_FILTER

[GitHub] [hudi] vinothchandar commented on a change in pull request #2540: [HUDI-1586] [Common Core] [Flink Integration] Reduce the coupling of hadoop.

2021-02-09 Thread GitBox
vinothchandar commented on a change in pull request #2540: URL: https://github.com/apache/hudi/pull/2540#discussion_r573380111 ## File path: hudi-common/pom.xml ## @@ -154,6 +155,7 @@ org.apache.hadoop hadoop-hdfs + provided Review comment: wond

[GitHub] [hudi] vinothchandar closed issue #2240: [SUPPORT] Performance Issue : HUDI MOR ,UPSERT Job running forever

2021-02-09 Thread GitBox
vinothchandar closed issue #2240: URL: https://github.com/apache/hudi/issues/2240 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] vinothchandar commented on issue #2240: [SUPPORT] Performance Issue : HUDI MOR ,UPSERT Job running forever

2021-02-09 Thread GitBox
vinothchandar commented on issue #2240: URL: https://github.com/apache/hudi/issues/2240#issuecomment-776311165 great. if you have any stories to share for a blog or tuning guide, that would also help everyone! This is an au

[jira] [Commented] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282105#comment-17282105 ] Vinoth Chandar commented on HUDI-1602: -- So the change I see [~afilipchik] is that `Me

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1602: - Fix Version/s: 0.8.0 > Corrupted Avro schema extracted from parquet file > ---

[jira] [Assigned] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-1602: Assignee: Vinoth Chandar > Corrupted Avro schema extracted from parquet file >

[jira] [Updated] (HUDI-1602) Corrupted Avro schema extracted from parquet file

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1602: - Labels: sev:critical (was: ) > Corrupted Avro schema extracted from parquet file > --

[GitHub] [hudi] bvaradar commented on issue #2561: [SUPPORT] Working on a POC to integrate Hudi with Spark Structured streaming and Hive

2021-02-09 Thread GitBox
bvaradar commented on issue #2561: URL: https://github.com/apache/hudi/issues/2561#issuecomment-776298049 Please take a look at https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/test/java/HoodieJavaStreamingApp.java for an example --

[GitHub] [hudi] pavannpa opened a new issue #2561: [SUPPORT] Working on a POC to integrate Hudi with Spark Structured streaming and Hive

2021-02-09 Thread GitBox
pavannpa opened a new issue #2561: URL: https://github.com/apache/hudi/issues/2561 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Yes - Join the mailing list to engage in conversations and get f

[GitHub] [hudi] bvaradar commented on issue #2546: Whether to provide flink to read the api of hudi, or use flink sql to query hudi?

2021-02-09 Thread GitBox
bvaradar commented on issue #2546: URL: https://github.com/apache/hudi/issues/2546#issuecomment-776254596 @robin-su : Can you kindly elaborate the question ? This is an automated message from the Apache Git Service. To respo

[jira] [Comment Edited] (HUDI-1063) Save in Google Cloud Storage not working

2021-02-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282045#comment-17282045 ] sivabalan narayanan edited comment on HUDI-1063 at 2/9/21, 9:13 PM:

[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working

2021-02-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282045#comment-17282045 ] sivabalan narayanan commented on HUDI-1063: --- [~afilipchik]: Can you help here w/

[jira] [Commented] (HUDI-1063) Save in Google Cloud Storage not working

2021-02-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282038#comment-17282038 ] sivabalan narayanan commented on HUDI-1063: --- [~WaterKnight]: Can you try disabli

[jira] [Comment Edited] (HUDI-1063) Save in Google Cloud Storage not working

2021-02-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282038#comment-17282038 ] sivabalan narayanan edited comment on HUDI-1063 at 2/9/21, 8:53 PM:

[jira] [Commented] (HUDI-1240) Simplify config classes

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282024#comment-17282024 ] Vinoth Chandar commented on HUDI-1240: -- we can keep the DataSourceWriteOptions/DataSo

[jira] [Commented] (HUDI-1240) Simplify config classes

2021-02-09 Thread Wenning Ding (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282006#comment-17282006 ] Wenning Ding commented on HUDI-1240: Make sense to me! This is similar to what I thoug

[GitHub] [hudi] kpurella edited a comment on issue #2240: [SUPPORT] Performance Issue : HUDI MOR ,UPSERT Job running forever

2021-02-09 Thread GitBox
kpurella edited a comment on issue #2240: URL: https://github.com/apache/hudi/issues/2240#issuecomment-776190924 @nsivabalan @bvaradar Thank you . for your support .. Sorry i forgot to update to this thread , we were able to achieve required Performance with @bvaradar Suggestions.How ever

[GitHub] [hudi] kpurella commented on issue #2240: [SUPPORT] Performance Issue : HUDI MOR ,UPSERT Job running forever

2021-02-09 Thread GitBox
kpurella commented on issue #2240: URL: https://github.com/apache/hudi/issues/2240#issuecomment-776190924 @nsivabalan @bvaradar Thank you . for your support .. Sorry i forgot to update to this thread , we were achieved required Performance with @bvaradar Suggestions. and we further impro

[GitHub] [hudi] andormarkus commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-02-09 Thread GitBox
andormarkus commented on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-776172403 Thanks @vinothchandar As soon someone confirms Apache Spark not affected with this issue I can raise an AWS Support ticket.

[jira] [Updated] (HUDI-466) [Umbrella] Record level, global low-latency index implementation

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-466: Labels: hudi-umbrellas (was: ) > [Umbrella] Record level, global low-latency index implementation >

[jira] [Updated] (HUDI-538) [UMBRELLA] Restructuring hudi client module for multi engine support

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-538: Labels: hudi-umbrellas (was: ) > [UMBRELLA] Restructuring hudi client module for multi engine suppor

[jira] [Updated] (HUDI-1236) [UMBRELLA] Long running test suite

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1236: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Long running test suite > ---

[jira] [Updated] (HUDI-270) [UMBRELLA] Improve Hudi website UI and documentation

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-270: Labels: hudi-umbrellas (was: ) > [UMBRELLA] Improve Hudi website UI and documentation >

[jira] [Updated] (HUDI-1246) [UMBRELLA] Microbenchmarks for key code paths

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1246: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Microbenchmarks for key code paths >

[jira] [Updated] (HUDI-1251) [UMBRELLA] CI stability and debugging integ tests

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] CI stability and debugging integ tests >

[jira] [Updated] (HUDI-1250) [UMBRELLA] Test coverage

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1250: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Test coverage > > >

[jira] [Updated] (HUDI-1238) [UMBRELLA] Perf test env

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1238: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Perf test env > > >

[jira] [Updated] (HUDI-1248) [UMBRELLA] Tests cleanup and fixes

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1248: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Tests cleanup and fixes > ---

[jira] [Updated] (HUDI-1239) [UMBRELLA] Config clean up

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1239: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Config clean up > -- > >

[jira] [Updated] (HUDI-1249) [UMBRELLA] refactor tests for ease of development

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1249: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] refactor tests for ease of development >

[jira] [Updated] (HUDI-1244) [UMBRELLA] Publish nightly / snapshot releases

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1244: - Labels: hudi-umbrellas (was: ) > [UMBRELLA] Publish nightly / snapshot releases > ---

[jira] [Updated] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1297: - Labels: hudi-umbrellas (was: ) > [Umbrella] Revamp Spark Datasource support using Spark 3 APIs >

[GitHub] [hudi] satishkotha edited a comment on issue #2555: [SUPPORT] Trying and Understanding Clustering

2021-02-09 Thread GitBox
satishkotha edited a comment on issue #2555: URL: https://github.com/apache/hudi/issues/2555#issuecomment-776154756 if hoodie.cleaner.fileversions.retained=1, incremental queries wont work because theres only one version retained. I was just suggesting this as short term workaround until h

[GitHub] [hudi] satishkotha commented on issue #2555: [SUPPORT] Trying and Understanding Clustering

2021-02-09 Thread GitBox
satishkotha commented on issue #2555: URL: https://github.com/apache/hudi/issues/2555#issuecomment-776154756 if hoodie.cleaner.fileversions.retained=1, incremental queries wont work because theres only one version retained. I was just suggesting this as short term workaround until hudi ver

[hudi] branch master updated: [MINOR] Fix the wrong comment for HoodieJavaWriteClientExample (#2559)

2021-02-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a2f85d9 [MINOR] Fix the wrong comment for Hoodie

[GitHub] [hudi] vinothchandar merged pull request #2559: [MINOR] Fix the wrong comment for HoodieJavaWriteClientExample

2021-02-09 Thread GitBox
vinothchandar merged pull request #2559: URL: https://github.com/apache/hudi/pull/2559 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] rubenssoto commented on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-09 Thread GitBox
rubenssoto commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-776129579 yeah @vinothchandar that's it. if you have any problem reproducing the problem, please let me know This is an auto

[GitHub] [hudi] vinothchandar commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-02-09 Thread GitBox
vinothchandar commented on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-776127759 @andormarkus AFAIK aws has its own spark fork. @umehrot2 mentioned on slack IIRC that this is related This is an

[GitHub] [hudi] vinothchandar commented on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-09 Thread GitBox
vinothchandar commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-776118917 In summary, this problem happens even locally with row writer enabled? I asking so I can reproduce this (I cannot understand how row writing and archival is related), but will tr

[jira] [Commented] (HUDI-1240) Simplify config classes

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281871#comment-17281871 ] Vinoth Chandar commented on HUDI-1240: -- cc [~wenningd] does this make sense? This is

[jira] [Updated] (HUDI-1240) Simplify config classes

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1240: - Description: Cleanup config classes across the board with a {{HoodieConfig}}  class in hudi-commo

[jira] [Updated] (HUDI-270) [UMBRELLA] Improve Hudi website UI and documentation

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-270: Component/s: hudi-umbrellas > [UMBRELLA] Improve Hudi website UI and documentation >

[jira] [Updated] (HUDI-1248) [UMBRELLA] Tests cleanup and fixes

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1248: - Component/s: hudi-umbrellas > [UMBRELLA] Tests cleanup and fixes > ---

[jira] [Updated] (HUDI-1249) [UMBRELLA] refactor tests for ease of development

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1249: - Component/s: hudi-umbrellas > [UMBRELLA] refactor tests for ease of development >

[jira] [Updated] (HUDI-1244) [UMBRELLA] Publish nightly / snapshot releases

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1244: - Component/s: hudi-umbrellas > [UMBRELLA] Publish nightly / snapshot releases > ---

[jira] [Updated] (HUDI-1238) [UMBRELLA] Perf test env

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1238: - Component/s: hudi-umbrellas > [UMBRELLA] Perf test env > > >

[jira] [Updated] (HUDI-1246) [UMBRELLA] Microbenchmarks for key code paths

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1246: - Component/s: hudi-umbrellas > [UMBRELLA] Microbenchmarks for key code paths >

[jira] [Updated] (HUDI-466) [Umbrella] Record level, global low-latency index implementation

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-466: Component/s: hudi-umbrellas > [Umbrella] Record level, global low-latency index implementation >

[jira] [Updated] (HUDI-1251) [UMBRELLA] CI stability and debugging integ tests

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1251: - Component/s: hudi-umbrellas > [UMBRELLA] CI stability and debugging integ tests >

[jira] [Updated] (HUDI-1250) [UMBRELLA] Test coverage

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1250: - Component/s: hudi-umbrellas > [UMBRELLA] Test coverage > > >

[jira] [Updated] (HUDI-1388) [UMBRELLA] Improve CLI features and usabilities

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1388: - Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Improve CLI

[jira] [Updated] (HUDI-1239) [UMBRELLA] Config clean up

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1239: - Component/s: hudi-umbrellas > [UMBRELLA] Config clean up > -- > >

[jira] [Updated] (HUDI-1385) [UMBRELLA] Improve source ingestion support in DeltaStreamer

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1385: - Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Improve sour

[jira] [Updated] (HUDI-1387) [UMBRELLA] Support Apache Calcite for writing/querying Hudi datasets

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1387: - Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Support Apac

[jira] [Updated] (HUDI-1390) [UMBRELLA] Support schema inference for unstructured data

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1390: - Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Support sche

[jira] [Updated] (HUDI-538) [UMBRELLA] Restructuring hudi client module for multi engine support

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-538: Component/s: hudi-umbrellas > [UMBRELLA] Restructuring hudi client module for multi engine support >

[jira] [Updated] (HUDI-1236) [UMBRELLA] Long running test suite

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1236: - Component/s: hudi-umbrellas > [UMBRELLA] Long running test suite > ---

[jira] [Updated] (HUDI-1389) [UMBRELLA] Survey indexing technique for better query performance

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1389: - Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Survey index

[GitHub] [hudi] nsivabalan commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-02-09 Thread GitBox
nsivabalan commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-776051584 @vinothchandar @n3nash @bvaradar : One of the customer mentioned that disabling vectorized reader fixed the issue for them. Hope it should be fine? And, do we need to make a note of

[GitHub] [hudi] andormarkus commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-02-09 Thread GitBox
andormarkus commented on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-776047022 @vinothchandar We are using EMR 6.2.0 which gives you AWS Spark 3.0.1 and the latest Apache Spark release is 3.0.1. I dont see reason version mismatch from this perspective.

[jira] [Updated] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1297: - Component/s: hudi-umbrellas > [Umbrella] Revamp Spark Datasource support using Spark 3 APIs >

[jira] [Updated] (HUDI-60) [UMBRELLA] Support Apache Beam for incremental tailing

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-60: --- Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Support Apache Bea

[jira] [Updated] (HUDI-1292) [Umbrella] RFC-15 : File Listing and Query Planning Optimizations

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1292: - Labels: hudi-umbrellas pull-request-available (was: pull-request-available) > [Umbrella] RFC-15 :

[jira] [Updated] (HUDI-57) [UMBRELLA] Support ORC Storage

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-57: --- Labels: hudi-umbrellas pull-request-available (was: pull-request-available) > [UMBRELLA] Support ORC St

[hudi] branch master updated: [HUDI-1603] fix DefaultHoodieRecordPayload serialization failure (#2556)

2021-02-09 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7a98b1c [HUDI-1603] fix DefaultHoodieRecordPay

[jira] [Updated] (HUDI-1237) [UMBRELLA] Checkstyle, formatting, warnings, spotless

2021-02-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1237: - Labels: gsoc gsoc2021 hudi-umbrellas mentor (was: gsoc gsoc2021 mentor) > [UMBRELLA] Checkstyle,

[GitHub] [hudi] nsivabalan merged pull request #2556: [HUDI-1603] fix DefaultHoodieRecordPayload serialization failure

2021-02-09 Thread GitBox
nsivabalan merged pull request #2556: URL: https://github.com/apache/hudi/pull/2556 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] vinothchandar commented on issue #2535: [SUPPORT] _hoodie_is_deleted not working with custom transformer

2021-02-09 Thread GitBox
vinothchandar commented on issue #2535: URL: https://github.com/apache/hudi/issues/2535#issuecomment-776042684 may be some room for docs to be improved? http://hudi.apache.org/docs/writing_data.html#deletes could say `add a boolean column named _hoodie_is_deleted to DataSet

[GitHub] [hudi] nsivabalan commented on pull request #2556: [HUDI-1603] fix DefaultHoodieRecordPayload serialization failure

2021-02-09 Thread GitBox
nsivabalan commented on pull request #2556: URL: https://github.com/apache/hudi/pull/2556#issuecomment-776042400 :( yeah, feel very bad for this. shouldn't have let this happen. This is an automated message from the Apache G

[GitHub] [hudi] nsivabalan closed issue #2527: [SUPPORT] - DeltaStreamer with AWS Glue Schema Registry

2021-02-09 Thread GitBox
nsivabalan closed issue #2527: URL: https://github.com/apache/hudi/issues/2527 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] nsivabalan commented on issue #2527: [SUPPORT] - DeltaStreamer with AWS Glue Schema Registry

2021-02-09 Thread GitBox
nsivabalan commented on issue #2527: URL: https://github.com/apache/hudi/issues/2527#issuecomment-776038736 Closing this out as we have a AWS ticket open. Please do reach out to us if you have more questions. Thanks for helping improve Hudi for better :) -

[GitHub] [hudi] vinothchandar commented on issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source

2021-02-09 Thread GitBox
vinothchandar commented on issue #2522: URL: https://github.com/apache/hudi/issues/2522#issuecomment-776037981 or extend `DefaulHoodieRecordPayload` This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] nsivabalan commented on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-02-09 Thread GitBox
nsivabalan commented on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-776036854 thanks @liujinhui1994 . Would you mind updating what was the workaround you did on your end. Might help someone in future. Also, once you update this ticket, feel free to close it ou

[GitHub] [hudi] vinothchandar commented on pull request #2548: [HUDI-1597] remove deprecated spring repos from pom

2021-02-09 Thread GitBox
vinothchandar commented on pull request #2548: URL: https://github.com/apache/hudi/pull/2548#issuecomment-776034728 > What I mean is that whether we need to freeze the more changes via releasing a minor branch. definitely the cleaner approach. but not sure if the time is worth invest

[GitHub] [hudi] vinothchandar commented on issue #2557: [SUPPORT]Container exited with a non-zero exit code 137

2021-02-09 Thread GitBox
vinothchandar commented on issue #2557: URL: https://github.com/apache/hudi/issues/2557#issuecomment-776032523 >When I first run the cow table (SaveMode.Overwrite), it's very fast.(about 700MB data in hdfs). but when I run an increment(SaveMode.Append), it's very slowly,and throw error

[GitHub] [hudi] nsivabalan commented on issue #2528: [SUPPORT] Spark read hudi data from hive (metastore)

2021-02-09 Thread GitBox
nsivabalan commented on issue #2528: URL: https://github.com/apache/hudi/issues/2528#issuecomment-776032385 @kingkongpoon : sorry, do you mean, after setting the config (spark.sql.hive.convertMetastoreParquet=false), issue is resolved? If yes, can we close this ticket then.

[GitHub] [hudi] vinothchandar commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-02-09 Thread GitBox
vinothchandar commented on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-776028785 Folks, this is due to version mismatch between aws spark and apache spark. Hudi releases are built against apache spark and aws typically follows up with a EMR release. The prob

[GitHub] [hudi] vinothchandar edited a comment on issue #2533: [SUPPORT] Found in-flight commits after time :20210129225133, please rollback greater commits first

2021-02-09 Thread GitBox
vinothchandar edited a comment on issue #2533: URL: https://github.com/apache/hudi/issues/2533#issuecomment-776025099 >b) delete all files that have 20210129225129 in its name actually they need to be deleted as well. the hudi cli restore should have been able to do this for yo

[GitHub] [hudi] vinothchandar commented on issue #2533: [SUPPORT] Found in-flight commits after time :20210129225133, please rollback greater commits first

2021-02-09 Thread GitBox
vinothchandar commented on issue #2533: URL: https://github.com/apache/hudi/issues/2533#issuecomment-776025099 >b) delete all files that have 20210129225129 in its name actually they need to be cleaned as well. the hudi cli restore should have been able to do this for you actually,

[GitHub] [hudi] cdmikechen edited a comment on issue #2544: [SUPPORT]failed to read timestamp column in version 0.7.0 even when HIVE_SUPPORT_TIMESTAMP is enabled

2021-02-09 Thread GitBox
cdmikechen edited a comment on issue #2544: URL: https://github.com/apache/hudi/issues/2544#issuecomment-776007206 I think it is still a problem in Hive2. In Hive2, Hive can not identify logical timestamp type. Spark3 use hive2 lib, if we use spark sql with `enableHiveSupport()` to read

[GitHub] [hudi] cdmikechen edited a comment on issue #2544: [SUPPORT]failed to read timestamp column in version 0.7.0 even when HIVE_SUPPORT_TIMESTAMP is enabled

2021-02-09 Thread GitBox
cdmikechen edited a comment on issue #2544: URL: https://github.com/apache/hudi/issues/2544#issuecomment-776007206 I think it is still a problem in Hive2. In Hive2, Hive can not identify logical timestamp type. Spark3 user hive2 lib, if we use spark sql with `enableHiveSupport()` to rea

[GitHub] [hudi] cdmikechen commented on issue #2544: [SUPPORT]failed to read timestamp column in version 0.7.0 even when HIVE_SUPPORT_TIMESTAMP is enabled

2021-02-09 Thread GitBox
cdmikechen commented on issue #2544: URL: https://github.com/apache/hudi/issues/2544#issuecomment-776007206 I think it is still a problem in Hive2. In Hive2, Hive can not identify logical timestamp type. This is an automated

[GitHub] [hudi] codecov-io commented on pull request #2560: [HUDI-1606]fix HoodieJavaWriteClientExample

2021-02-09 Thread GitBox
codecov-io commented on pull request #2560: URL: https://github.com/apache/hudi/pull/2560#issuecomment-775997974 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2560?src=pr&el=h1) Report > Merging [#2560](https://codecov.io/gh/apache/hudi/pull/2560?src=pr&el=desc) (dc9fdb0) into [ma

[GitHub] [hudi] caidezhi commented on issue #2558: [SUPPORT] HoodieJavaWriteClientExample fail with exception

2021-02-09 Thread GitBox
caidezhi commented on issue #2558: URL: https://github.com/apache/hudi/issues/2558#issuecomment-775985581 PR for this issue : https://github.com/apache/hudi/pull/2560 JIRA : https://issues.apache.org/jira/browse/HUDI-1606

[jira] [Updated] (HUDI-1606) HoodieJavaWriteClientExample fail with exception

2021-02-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1606: - Labels: pull-request-available (was: ) > HoodieJavaWriteClientExample fail with exception > -

[GitHub] [hudi] caidezhi opened a new pull request #2560: [HUDI-1606]fix HoodieJavaWriteClientExample

2021-02-09 Thread GitBox
caidezhi opened a new pull request #2560: URL: https://github.com/apache/hudi/pull/2560 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pu

[GitHub] [hudi] nsivabalan edited a comment on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

2021-02-09 Thread GitBox
nsivabalan edited a comment on issue #2255: URL: https://github.com/apache/hudi/issues/2255#issuecomment-775977895 @WTa-hash : I tried to reproduce this w/ quick start utils. I couldn't see the duplicates. But I could reproduce with the test script you gave me just w/ my local spark-shell.

[GitHub] [hudi] nsivabalan edited a comment on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

2021-02-09 Thread GitBox
nsivabalan edited a comment on issue #2255: URL: https://github.com/apache/hudi/issues/2255#issuecomment-775977895 @WTa-hash : I tried to reproduce this w/ quick start utils. I couldn't see the duplicates. But I could reproduce with the test script you gave me just w/ my local spark-shell.

[GitHub] [hudi] nsivabalan commented on issue #2255: [SUPPORT] Global Bloom and partition update not working correctly in MOR table

2021-02-09 Thread GitBox
nsivabalan commented on issue #2255: URL: https://github.com/apache/hudi/issues/2255#issuecomment-775977895 @WTa-hash : I tried to reproduce this w/ quick start utils. I couldn't see the duplicates. But I could reproduce with the test script you gave me just w/ my local spark-shell. So, so

[jira] [Updated] (HUDI-1606) HoodieJavaWriteClientExample fail with exception

2021-02-09 Thread Dezhi Cai (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dezhi Cai updated HUDI-1606: Attachment: log.txt > HoodieJavaWriteClientExample fail with exception > ---

[jira] [Updated] (HUDI-1606) HoodieJavaWriteClientExample fail with exception

2021-02-09 Thread Dezhi Cai (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dezhi Cai updated HUDI-1606: Priority: Minor (was: Major) > HoodieJavaWriteClientExample fail with exception > -

  1   2   >