[GitHub] [hudi] codecov-io edited a comment on pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2483: URL: https://github.com/apache/hudi/pull/2483#issuecomment-766306624 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=h1) Report > Merging [#2483](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=desc) (b38) in

[GitHub] [hudi] vinothchandar commented on issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source

2021-01-24 Thread GitBox
vinothchandar commented on issue #2479: URL: https://github.com/apache/hudi/issues/2479#issuecomment-766312107 Hi @rubenssoto #2481 , could you give this a shot? if it works, we can just land that PR This is an automated mes

[GitHub] [hudi] codecov-io edited a comment on pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2483: URL: https://github.com/apache/hudi/pull/2483#issuecomment-766306624 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=h1) Report > Merging [#2483](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=desc) (ba566c2) in

[GitHub] [hudi] shenh062326 commented on a change in pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
shenh062326 commented on a change in pull request #2478: URL: https://github.com/apache/hudi/pull/2478#discussion_r563262820 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java ## @@ -107,6 +111,28 @@ public static HoodieTableMetaClien

[GitHub] [hudi] shenh062326 commented on a change in pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
shenh062326 commented on a change in pull request #2478: URL: https://github.com/apache/hudi/pull/2478#discussion_r563262860 ## File path: hudi-client/hudi-java-client/src/test/java/org/apache/hudi/testutils/HoodieJavaClientTestUtils.java ## @@ -0,0 +1,313 @@ +/* + * Licensed

[GitHub] [hudi] codecov-io edited a comment on pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2478: URL: https://github.com/apache/hudi/pull/2478#issuecomment-765971598 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2478?src=pr&el=h1) Report > Merging [#2478](https://codecov.io/gh/apache/hudi/pull/2478?src=pr&el=desc) (0a53aa4) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2483: URL: https://github.com/apache/hudi/pull/2483#issuecomment-766306624 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=h1) Report > Merging [#2483](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=desc) (ba566c2) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2478: URL: https://github.com/apache/hudi/pull/2478#issuecomment-765971598 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2478?src=pr&el=h1) Report > Merging [#2478](https://codecov.io/gh/apache/hudi/pull/2478?src=pr&el=desc) (0a53aa4) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2483: URL: https://github.com/apache/hudi/pull/2483#issuecomment-766306624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2483: [HUDI-1545] Add test cases for INSERT_OVERWRITE Operation

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2483: URL: https://github.com/apache/hudi/pull/2483#issuecomment-766306624 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=h1) Report > Merging [#2483](https://codecov.io/gh/apache/hudi/pull/2483?src=pr&el=desc) (ba566c2) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2478: URL: https://github.com/apache/hudi/pull/2478#issuecomment-765971598 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2478: URL: https://github.com/apache/hudi/pull/2478#issuecomment-765971598 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2478?src=pr&el=h1) Report > Merging [#2478](https://codecov.io/gh/apache/hudi/pull/2478?src=pr&el=desc) (0a53aa4) in

[GitHub] [hudi] leesf merged pull request #2477: [MINOR] Use skipTests flag for skip.hudi-spark2.unit.tests property

2021-01-24 Thread GitBox
leesf merged pull request #2477: URL: https://github.com/apache/hudi/pull/2477 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[hudi] branch master updated (e302c6b -> 84df263)

2021-01-24 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from e302c6b [HUDI-1453] Fix NPE using HoodieFlinkStreamer to etl data from kafka to hudi (#2474) add 84df263 [MINOR]

[GitHub] [hudi] rubenssoto commented on issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source

2021-01-24 Thread GitBox
rubenssoto commented on issue #2479: URL: https://github.com/apache/hudi/issues/2479#issuecomment-766352035 Thank you so much @vinothchandar ! It worked, I will close this issue. This is an automated me

[GitHub] [hudi] rubenssoto closed issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source

2021-01-24 Thread GitBox
rubenssoto closed issue #2479: URL: https://github.com/apache/hudi/issues/2479 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] vinothchandar commented on issue #2479: [SUPPORT] Dependency Issue When I Try Build Hudi From Source

2021-01-24 Thread GitBox
vinothchandar commented on issue #2479: URL: https://github.com/apache/hudi/issues/2479#issuecomment-766369989 Great. No thank you for catching :). eventually as m2 caches are lost, I think build would have failed. may be month or so from now :). Will merge the fix -

[hudi] branch master updated: Removing spring repos from pom (#2481)

2021-01-24 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 81836f0 Removing spring repos from pom (#2481) 81

[GitHub] [hudi] vinothchandar merged pull request #2481: [MINOR] Removing spring repos from pom

2021-01-24 Thread GitBox
vinothchandar merged pull request #2481: URL: https://github.com/apache/hudi/pull/2481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] rubenssoto opened a new issue #2484: [SUPPORT] Hudi Write Performance

2021-01-24 Thread GitBox
rubenssoto opened a new issue #2484: URL: https://github.com/apache/hudi/issues/2484 Hello, I want to start using Hudi on my datalake, so I'm running some performance tests comparing current processing time with and without Hudi. We have a lot of tables in our datalake so we are pro

[GitHub] [hudi] xushiyan merged pull request #2478: [HUDI-1476] Introduce unit test infra for java client

2021-01-24 Thread GitBox
xushiyan merged pull request #2478: URL: https://github.com/apache/hudi/pull/2478 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[hudi] branch master updated: [HUDI-1476] Introduce unit test infra for java client (#2478)

2021-01-24 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new c4afd17 [HUDI-1476] Introduce unit test infra f

[jira] [Commented] (HUDI-1278) Need a generic payload class which can skip late arriving data based on specific fields

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270965#comment-17270965 ] sivabalan narayanan commented on HUDI-1278: --- [~vbalaji]: Can you clarify the req

[jira] [Commented] (HUDI-849) Turn on incremental Syncing by default for DeltaStreamer and spark streaming cases

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270966#comment-17270966 ] sivabalan narayanan commented on HUDI-849: -- [~vbalaji]: this is more of a todo thi

[jira] [Updated] (HUDI-1505) Allow pluggable option to write error records to side table, queue

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1505: -- Labels: (was: user-support-issues) > Allow pluggable option to write error records to

[jira] [Commented] (HUDI-1290) Implement Debezium avro source for Delta Streamer

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270969#comment-17270969 ] sivabalan narayanan commented on HUDI-1290: --- Does this quality to be labeled as

[jira] [Updated] (HUDI-1070) Direct write from spark to Parquet when doing Upserts, Inserts and Deletes

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1070: -- Labels: (was: user-support-issues) > Direct write from spark to Parquet when doing Ups

[jira] [Updated] (HUDI-1546) Fix hive sync tool path in website documentation

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1546: -- Labels: user-support-issues (was: ) > Fix hive sync tool path in website documentation

[jira] [Created] (HUDI-1546) Fix hive sync tool path in website documentation

2021-01-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1546: - Summary: Fix hive sync tool path in website documentation Key: HUDI-1546 URL: https://issues.apache.org/jira/browse/HUDI-1546 Project: Apache Hudi

[GitHub] [hudi] nsivabalan closed issue #2480: [SUPPORT] The Docker demo document description is incorrect

2021-01-24 Thread GitBox
nsivabalan closed issue #2480: URL: https://github.com/apache/hudi/issues/2480 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] nsivabalan commented on issue #2480: [SUPPORT] The Docker demo document description is incorrect

2021-01-24 Thread GitBox
nsivabalan commented on issue #2480: URL: https://github.com/apache/hudi/issues/2480#issuecomment-766427153 Sure, will take it up. Closing it as we have a tracking jira. https://issues.apache.org/jira/browse/HUDI-1546 ---

[GitHub] [hudi] nsivabalan commented on issue #2467: [Travis issue] TestJsonStringToHoodieRecordMapFunction.testMapFunction failed

2021-01-24 Thread GitBox
nsivabalan commented on issue #2467: URL: https://github.com/apache/hudi/issues/2467#issuecomment-766427684 Have created a tracking jira https://issues.apache.org/jira/browse/HUDI-1547 This is an automated message from th

[jira] [Created] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction

2021-01-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1547: - Summary: CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction Key: HUDI-1547 URL: https://issues.apache.org/jira/browse/HUDI-1547

[jira] [Updated] (HUDI-1547) CI intermittent failure: TestJsonStringToHoodieRecordMapFunction.testMapFunction

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1547: -- Labels: user-support-issues (was: ) > CI intermittent failure: > TestJsonStringToHoodi

[jira] [Updated] (HUDI-1528) hudi-sync-tools error

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1528: -- Labels: pull-request-available user-support-issues (was: pull-request-available) > hud

[GitHub] [hudi] nsivabalan commented on issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on

2021-01-24 Thread GitBox
nsivabalan commented on issue #2429: URL: https://github.com/apache/hudi/issues/2429#issuecomment-766428773 @vinothchandar : closing this for now. feel free to re-open if you see more issues. This is an automated message fr

[GitHub] [hudi] nsivabalan closed issue #2429: [SUPPORT] S3 throws ConnectionPoolTimeoutException: Timeout waiting for connection from pool when metadata table is turned on

2021-01-24 Thread GitBox
nsivabalan closed issue #2429: URL: https://github.com/apache/hudi/issues/2429 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [hudi] nsivabalan commented on issue #2399: [SUPPORT] Hudi deletes not being properly commited

2021-01-24 Thread GitBox
nsivabalan commented on issue #2399: URL: https://github.com/apache/hudi/issues/2399#issuecomment-766431496 @afeldman1 : can you respond when you can. This is an automated message from the Apache Git Service. To respond to t

[GitHub] [hudi] nsivabalan commented on issue #2367: [SUPPORT] Seek error when querying MOR Tables in GCP

2021-01-24 Thread GitBox
nsivabalan commented on issue #2367: URL: https://github.com/apache/hudi/issues/2367#issuecomment-766431687 Sure. sorry about the delay. will get to this in a day or two. This is an automated message from the Apache Git Serv

[jira] [Updated] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1539: -- Labels: user-support-issues (was: ) > Bug in HoodieCombineRealtimeRecordReader returns

[jira] [Updated] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1539: -- Affects Version/s: 0.8.0 > Bug in HoodieCombineRealtimeRecordReader returns wrong result

[GitHub] [hudi] nsivabalan commented on issue #2331: Why does Hudi not support field deletions?

2021-01-24 Thread GitBox
nsivabalan commented on issue #2331: URL: https://github.com/apache/hudi/issues/2331#issuecomment-766432877 @prashantwason : In lieu of this ticket, do you think we can update our documentation wrt schema evolution. If you don't mind can you take it up and fix our documentation. https://i

[jira] [Created] (HUDI-1548) Fix documentation around schema evolution

2021-01-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1548: - Summary: Fix documentation around schema evolution Key: HUDI-1548 URL: https://issues.apache.org/jira/browse/HUDI-1548 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-1548) Fix documentation around schema evolution

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1548: -- Labels: user-support-issues (was: ) > Fix documentation around schema evolution >

[GitHub] [hudi] nsivabalan commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-24 Thread GitBox
nsivabalan commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-766433970 @vinothchandar @borislitvak : since we have a tracking jira, do you think we can close this? or is there anything pending to be resolved or discussed.

[GitHub] [hudi] nsivabalan commented on issue #2329: [SUPPORT] Time Travel (querying the historical versions of data) ability for Hudi Table

2021-01-24 Thread GitBox
nsivabalan commented on issue #2329: URL: https://github.com/apache/hudi/issues/2329#issuecomment-766435383 https://issues.apache.org/jira/browse/HUDI-1460 This is an automated message from the Apache Git Service. To resp

[jira] [Updated] (HUDI-1460) Time Travel (querying the historical versions of data) ability for Hudi Table

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1460: -- Labels: user-support-issues (was: ) > Time Travel (querying the historical versions of

[GitHub] [hudi] nsivabalan commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-24 Thread GitBox
nsivabalan commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-766435871 @Kirkuz: Do you have any updates in this regard. Can you please respond or let us know if you have more questions.

[GitHub] [hudi] nsivabalan commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-24 Thread GitBox
nsivabalan commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364 @zherenyu831 : can you please respond with any updates on your end. @n3nash : can you take a look when you have time. ---

[GitHub] [hudi] nsivabalan edited a comment on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-24 Thread GitBox
nsivabalan edited a comment on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766436364 @zherenyu831 : can you please respond with any updates on your end. @n3nash : can you please take a look when you have time. If you were able to narrow down the issue, plea

[GitHub] [hudi] nsivabalan commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-01-24 Thread GitBox
nsivabalan commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-766436747 @sanket-khedikar : can you please respond if the suggested approaches work for you. or you still need more enhancements from Hudi? If it's solved, would appreciate if you can close t

[GitHub] [hudi] nsivabalan commented on issue #2204: [SUPPORT] Hive count(*) query on _rt table failing with exception

2021-01-24 Thread GitBox
nsivabalan commented on issue #2204: URL: https://github.com/apache/hudi/issues/2204#issuecomment-766437535 @BalaMahesh : Would you mind updating the ticket. We will close this out in a weeks time if there are no activity. But feel free to re-open or create a new ticket if you have more qu

[GitHub] [hudi] nsivabalan commented on issue #2178: [SUPPORT] Hudi writing 10MB worth of org.apache.hudi.bloomfilter data in each of the parquet files produced

2021-01-24 Thread GitBox
nsivabalan commented on issue #2178: URL: https://github.com/apache/hudi/issues/2178#issuecomment-766438221 @KarthickAN : hope you got a chance to go through our [blog on indexes in Hudi](https://hudi.apache.org/blog/hudi-indexing-mechanisms/). Wrt this gh issue, please do let us know if y

[jira] [Updated] (HUDI-1549) Programmatic way to fetch earliest commit retained

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1549: -- Labels: user-sup (was: ) > Programmatic way to fetch earliest commit retained > --

[jira] [Updated] (HUDI-1549) Programmatic way to fetch earliest commit retained

2021-01-24 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1549: -- Fix Version/s: 0.8.0 > Programmatic way to fetch earliest commit retained > ---

[jira] [Created] (HUDI-1549) Programmatic way to fetch earliest commit retained

2021-01-24 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1549: - Summary: Programmatic way to fetch earliest commit retained Key: HUDI-1549 URL: https://issues.apache.org/jira/browse/HUDI-1549 Project: Apache Hudi

[GitHub] [hudi] nsivabalan commented on issue #2135: [SUPPORT] GDPR safe deletes is complex

2021-01-24 Thread GitBox
nsivabalan commented on issue #2135: URL: https://github.com/apache/hudi/issues/2135#issuecomment-766439085 @andaag : I have created a Hudi ticket for this. Feel free to update the desc of the ticket with more details https://issues.apache.org/jira/browse/HUDI-1549

[GitHub] [hudi] nsivabalan commented on issue #2123: Timestamp not parsed correctly on Athena

2021-01-24 Thread GitBox
nsivabalan commented on issue #2123: URL: https://github.com/apache/hudi/issues/2123#issuecomment-766439219 @satishkotha : when you get a chance, can you please follow up on this. This is an automated message from the Apache

[GitHub] [hudi] nsivabalan commented on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-01-24 Thread GitBox
nsivabalan commented on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932 @liujinhui1994 : We already have an [example in our HoodieTestDatagenerator](https://github.com/apache/hudi/blob/c4afd179c1983a382b8a5197d800b0f5dba254de/hudi-common/src/test/java/org

[GitHub] [hudi] nsivabalan edited a comment on issue #2121: [SUPPORT] How to define scehma for data in jsonArray format when using Deltastreamer

2021-01-24 Thread GitBox
nsivabalan edited a comment on issue #2121: URL: https://github.com/apache/hudi/issues/2121#issuecomment-766439932 @liujinhui1994 : Sorry about the delay. We already have an [example in our HoodieTestDatagenerator](https://github.com/apache/hudi/blob/c4afd179c1983a382b8a5197d800b0f5dba254de

[GitHub] [hudi] nsivabalan commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-24 Thread GitBox
nsivabalan commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534 @n3nash @bhasudha : sorry the thread is bit long. I see some workarounds have been proposed and it worked. But do we need to fixes in Hudi in general? if yes, can you file a jira and

[GitHub] [hudi] nsivabalan edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-24 Thread GitBox
nsivabalan edited a comment on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-766440534 @n3nash @bhasudha : sorry the thread is bit long, so couldn't gauge correctly. I see some workarounds have been proposed and it worked. But do we need to fixes in Hudi in gene

[jira] [Updated] (HUDI-1549) Programmatic way to fetch earliest commit retained

2021-01-24 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1549: - Labels: user-sup user-support-issues (was: user-sup) > Programmatic way to fetch earliest commit

[jira] [Commented] (HUDI-1549) Programmatic way to fetch earliest commit retained

2021-01-24 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271000#comment-17271000 ] Vinoth Chandar commented on HUDI-1549: -- [~shivnarayan] the label `user-sup` is wrong?

[GitHub] [hudi] vinothchandar closed issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-24 Thread GitBox
vinothchandar closed issue #2330: URL: https://github.com/apache/hudi/issues/2330 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] vinothchandar commented on issue #2330: Concurrent writes from multiple Spark drivers to S3 support

2021-01-24 Thread GitBox
vinothchandar commented on issue #2330: URL: https://github.com/apache/hudi/issues/2330#issuecomment-766441408 we can close this out This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] nsivabalan commented on issue #2066: [SUPPORT] Hudi is increasing the storage size big time

2021-01-24 Thread GitBox
nsivabalan commented on issue #2066: URL: https://github.com/apache/hudi/issues/2066#issuecomment-766449665 @KarthickAN : did you get a chance to try out the suggestion from Balaji. please do update the issue w/ any updates. If the issue is resolved, feel free to close it out. -

[GitHub] [hudi] nsivabalan commented on issue #2063: [SUPPORT] change column type from int to long, schema compatibility check failed

2021-01-24 Thread GitBox
nsivabalan commented on issue #2063: URL: https://github.com/apache/hudi/issues/2063#issuecomment-766449860 @cadl : did you get a chance to try out the setting? We plan to close out this issue due to inactivity in a weeks time. But feel free to reopen to create a new ticket if you find any

[GitHub] [hudi] vinothchandar commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-24 Thread GitBox
vinothchandar commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766450275 cc @garyli1019 as well This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [hudi] zherenyu831 commented on issue #2285: [SUPPORT] Exception on snapshot query while compaction (hudi 0.6.0)

2021-01-24 Thread GitBox
zherenyu831 commented on issue #2285: URL: https://github.com/apache/hudi/issues/2285#issuecomment-766482729 @bvaradar Hi Bavaradar, it will be little difficult to replicate the problem, since it only happens on huge amount of data.

[GitHub] [hudi] codecov-io edited a comment on pull request #2382: [HUDI-1477] Support CopyOnWriteTable in java client

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2382: URL: https://github.com/apache/hudi/pull/2382#issuecomment-751367927 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2382?src=pr&el=h1) Report > Merging [#2382](https://codecov.io/gh/apache/hudi/pull/2382?src=pr&el=desc) (498109c) in

[GitHub] [hudi] rubenssoto commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-24 Thread GitBox
rubenssoto commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 @umehrot2 @bvaradar Do you know if this problem will be solved in 0.7.0? I'm querying some big datasets with more than 500 partitions and I had the same problem. Thank

[GitHub] [hudi] rubenssoto edited a comment on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-24 Thread GitBox
rubenssoto edited a comment on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 @umehrot2 @bvaradar Do you know if this problem will be solved in 0.7.0? I'm querying some big datasets with more than 500 partitions and I had the same problem.

[GitHub] [hudi] rubenssoto edited a comment on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-24 Thread GitBox
rubenssoto edited a comment on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766496187 @umehrot2 @bvaradar Do you know if this problem will be solved in 0.7.0? I'm querying some big datasets with more than 500 partitions and I had the same problem.

[GitHub] [hudi] pengzhiwei2018 opened a new pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
pengzhiwei2018 opened a new pull request #2485: URL: https://github.com/apache/hudi/pull/2485 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[jira] [Updated] (HUDI-1109) Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1109: - Labels: pull-request-available (was: ) > Support Spark Structured Streaming read from Hudi table

[GitHub] [hudi] codecov-io commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
codecov-io commented on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (91cf083) into [ma

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (5d3ec8d) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (fa0056e) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (fa0056e) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (91cf083) in

[GitHub] [hudi] git-raj commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-01-24 Thread GitBox
git-raj commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-766523668 using AWS Glue pySpark and Hudi and S3 as data store: i'm trying to do the traditional SCD Type 2 where old record gets updated with the insert datetime on 'effective to' field, 'isActi

[GitHub] [hudi] pengzhiwei2018 commented on pull request #1880: [WIP] [HUDI-1125] build framework to support structured streaming

2021-01-24 Thread GitBox
pengzhiwei2018 commented on pull request #1880: URL: https://github.com/apache/hudi/pull/1880#issuecomment-766562247 > Hello, > > Hudi will have nice features like clustering and clustering probably will rewrite a lot of data, so is it possible this rewrites without new data doesn't

[GitHub] [hudi] vinothchandar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-24 Thread GitBox
vinothchandar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-766590769 @rubenssoto for some code paths, it will be. if you turn on `hoodie.metadata.enable=true` on the writing, you should see improvements. Hive queries should see improvement, SparkSQ

[GitHub] [hudi] vinothchandar commented on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-24 Thread GitBox
vinothchandar commented on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766593559 cc @garyli1019 mind taking a first pass at this PR? :) This is an automated message from the Apache Git Serv