[GitHub] [hudi] xushiyan closed issue #3714: [SUPPORT] Spark Hudi dataframe contains duplicate records when reading from non unique read paths

2021-09-26 Thread GitBox
xushiyan closed issue #3714: URL: https://github.com/apache/hudi/issues/3714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 ## CI report: * 9ad6e667a40b60b1bbf129554dcc208827a18dd6 Azure:

[GitHub] [hudi] xushiyan commented on issue #3714: [SUPPORT] Spark Hudi dataframe contains duplicate records when reading from non unique read paths

2021-09-26 Thread GitBox
xushiyan commented on issue #3714: URL: https://github.com/apache/hudi/issues/3714#issuecomment-927529180 @jainpriyansh786 though dedup read paths can help reduce risk of re-reading, it is a design choice where a platform product like Hudi may not want to interfere with users input, i.e.,

[GitHub] [hudi] hudi-bot edited a comment on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3668: URL: https://github.com/apache/hudi/pull/3668#issuecomment-919855741 ## CI report: * b5e0b228840b7c0575b6eaaf888e0f5bb6024b91 Azure:

[GitHub] [hudi] xushiyan edited a comment on issue #3617: [SUPPORT] Hive Sync to Glue throws Failed to read data schema

2021-09-26 Thread GitBox
xushiyan edited a comment on issue #3617: URL: https://github.com/apache/hudi/issues/3617#issuecomment-927522620 @novakov-alexey I checked the behavior is fixed in 0.9.0. Please give release-0.9.0 a try. You can find some guide here to override EMR hudi jars.

[GitHub] [hudi] xushiyan commented on issue #3617: [SUPPORT] Hive Sync to Glue throws Failed to read data schema

2021-09-26 Thread GitBox
xushiyan commented on issue #3617: URL: https://github.com/apache/hudi/issues/3617#issuecomment-927522620 @novakov-alexey I checked the behavior is fixed in 0.9.0. Please give release-0.9.0 a try. You can find some guide here to override EMR hudi jars.

[GitHub] [hudi] zhihuihong edited a comment on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-26 Thread GitBox
zhihuihong edited a comment on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-927513816 have you tried using clustering after inserting data? my job created many 7mb files as well, and i used clustering to reorganize data layout. I don't know how to change

[GitHub] [hudi] zhihuihong commented on issue #3676: MOR table rolls out new parquet files at 10MB for new inserts - even though max file size set as 128MB

2021-09-26 Thread GitBox
zhihuihong commented on issue #3676: URL: https://github.com/apache/hudi/issues/3676#issuecomment-927513816 have you tried using clustering after inserting data? my job created many 7mb files as well, and i used clustering to reorganize data layout. I don't know how to change 7mb

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 ## CI report: * a2025a05abadaf0580115f0cb133ffd5cc8a08e2 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3668: URL: https://github.com/apache/hudi/pull/3668#issuecomment-919855741 ## CI report: * b2d8b3f67e3556ccb84309b381ea513731537f8b Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3668: [RFC-33] [HUDI-2429][WIP] Full schema evolution

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3668: URL: https://github.com/apache/hudi/pull/3668#issuecomment-919855741 ## CI report: * b2d8b3f67e3556ccb84309b381ea513731537f8b Azure:

[GitHub] [hudi] Ambarish-Giri commented on issue #3605: [SUPPORT]Hudi Inserts and Upserts for MoR and CoW tables are taking very long time.

2021-09-26 Thread GitBox
Ambarish-Giri commented on issue #3605: URL: https://github.com/apache/hudi/issues/3605#issuecomment-927503142 Hi @nsivabalan let me know in case you need any further details? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 ## CI report: * a2025a05abadaf0580115f0cb133ffd5cc8a08e2 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 ## CI report: * cb86fea130eb467b42eef7da6c2382e2ee6ff037 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 ## CI report: * cb86fea130eb467b42eef7da6c2382e2ee6ff037 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3693: URL: https://github.com/apache/hudi/pull/3693#issuecomment-922600059 ## CI report: * cb86fea130eb467b42eef7da6c2382e2ee6ff037 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * 92634fb730e723a9bdeb165348a4c747794be7e3 Azure:

[GitHub] [hudi] YannByron commented on a change in pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
YannByron commented on a change in pull request #3693: URL: https://github.com/apache/hudi/pull/3693#discussion_r716328913 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/HoodieCommand.scala ## @@ -0,0 +1,47 @@ +/* + * Licensed

[GitHub] [hudi] YannByron commented on a change in pull request #3693: [HUDI-2456] support 'show partitions' sql

2021-09-26 Thread GitBox
YannByron commented on a change in pull request #3693: URL: https://github.com/apache/hudi/pull/3693#discussion_r716328861 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeyGenUtils.java ## @@ -40,7 +40,7 @@ protected static final String

[GitHub] [hudi] yanghua commented on a change in pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
yanghua commented on a change in pull request #3674: URL: https://github.com/apache/hudi/pull/3674#discussion_r716325492 ## File path: scripts/dependency.sh ## @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [hudi] yanghua commented on a change in pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
yanghua commented on a change in pull request #3674: URL: https://github.com/apache/hudi/pull/3674#discussion_r716324501 ## File path: scripts/dependency.sh ## @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [hudi] yanghua commented on a change in pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
yanghua commented on a change in pull request #3674: URL: https://github.com/apache/hudi/pull/3674#discussion_r716322867 ## File path: scripts/dependency.sh ## @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [hudi] yanghua commented on a change in pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
yanghua commented on a change in pull request #3674: URL: https://github.com/apache/hudi/pull/3674#discussion_r716321234 ## File path: scripts/dependency.sh ## @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [hudi] yanghua edited a comment on pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
yanghua edited a comment on pull request #3674: URL: https://github.com/apache/hudi/pull/3674#issuecomment-927459881 @xushiyan Thanks for sharing your thoughts. Let's discuss some points. > i see the point here is to allow PR reviewer easily identify dep changes. Yes, that's

[GitHub] [hudi] yanghua commented on pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
yanghua commented on pull request #3674: URL: https://github.com/apache/hudi/pull/3674#issuecomment-927459881 @xushiyan Thanks for sharing your thoughts. Let's discuss some points. > i see the point here is to allow PR reviewer easily identify dep changes. Yes, that's one of

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * 008f0cd91de12d44030d837f460f3b0232d62ee5 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3723: Make Spark datasource inserts consistent when dedup

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3723: URL: https://github.com/apache/hudi/pull/3723#issuecomment-927446085 ## CI report: * e6e4220e3199fccb871ed0ba3711deabeeda1daa Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3723: Make Spark datasource inserts consistent when dedup

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3723: URL: https://github.com/apache/hudi/pull/3723#issuecomment-927446085 ## CI report: * e6e4220e3199fccb871ed0ba3711deabeeda1daa Azure:

[GitHub] [hudi] hudi-bot commented on pull request #3723: Make Spark datasource inserts consistent when dedup

2021-09-26 Thread GitBox
hudi-bot commented on pull request #3723: URL: https://github.com/apache/hudi/pull/3723#issuecomment-927446085 ## CI report: * e6e4220e3199fccb871ed0ba3711deabeeda1daa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[GitHub] [hudi] xushiyan opened a new pull request #3723: Make Spark datasource inserts consistent when dedup

2021-09-26 Thread GitBox
xushiyan opened a new pull request #3723: URL: https://github.com/apache/hudi/pull/3723 ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary

[GitHub] [hudi] xushiyan commented on issue #3709: [SUPPORT] insert operation does not consistently insert duplicate records

2021-09-26 Thread GitBox
xushiyan commented on issue #3709: URL: https://github.com/apache/hudi/issues/3709#issuecomment-927439267 @helanto I can reproduce this and I agree with you that the dedup behaviors should be consistent across the same options. Also `PARQUET_SMALL_FILE_LIMIT` should just be a workaround

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * 008f0cd91de12d44030d837f460f3b0232d62ee5 Azure:

[jira] [Commented] (HUDI-1307) spark datasource load path format is confused for snapshot and increment read mode

2021-09-26 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420424#comment-17420424 ] liwei commented on HUDI-1307: - [~xushiyan] hello , recently i am focus on ingest kafka data using hudi with

[jira] [Created] (HUDI-2492) Clean up components and GitHub labels

2021-09-26 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-2492: Summary: Clean up components and GitHub labels Key: HUDI-2492 URL: https://issues.apache.org/jira/browse/HUDI-2492 Project: Apache Hudi Issue Type: Task

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * 008f0cd91de12d44030d837f460f3b0232d62ee5 Azure:

[jira] [Updated] (HUDI-2440) Add dependency change diff script for dependency governace

2021-09-26 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2440: - Component/s: Usability > Add dependency change diff script for dependency governace >

[GitHub] [hudi] xushiyan commented on a change in pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
xushiyan commented on a change in pull request #3674: URL: https://github.com/apache/hudi/pull/3674#discussion_r716275952 ## File path: scripts/dependency.sh ## @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [hudi] xushiyan commented on a change in pull request #3674: [HUDI-2440] Add dependency change diff script for dependency governace

2021-09-26 Thread GitBox
xushiyan commented on a change in pull request #3674: URL: https://github.com/apache/hudi/pull/3674#discussion_r716274428 ## File path: scripts/dependency.sh ## @@ -0,0 +1,123 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * e98b2fe91cf2485ab98381dc57fc51277c3fbb86 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 ## CI report: * dce615fd3058a817e839a852dc5de8b06d518658 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * c83d176e1d1ddab6bb86f91726060779a9a1519f Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3590: URL: https://github.com/apache/hudi/pull/3590#issuecomment-912237120 ## CI report: * aefac7ec2f2e40bdf3ad4365ea6aa825803a439d UNKNOWN * c83d176e1d1ddab6bb86f91726060779a9a1519f Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 6b38fee2eba5474b898b568935189f7c5294d872 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 6b38fee2eba5474b898b568935189f7c5294d872 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 ## CI report: * 80c18c266ec08a5021dcc3fba279f7fcd68b75b7 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 ## CI report: * 80c18c266ec08a5021dcc3fba279f7fcd68b75b7 Azure:

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716259457 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -167,173 +170,150 @@ private void initIfNeeded() {

[jira] [Updated] (HUDI-2472) Tests failure follow up when metadata is enabled by default

2021-09-26 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2472: -- Description: We plan to enable metadata by default. but there are some tests that fail

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716254203 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java ## @@ -66,4 +77,34 @@ protected

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 6b38fee2eba5474b898b568935189f7c5294d872 Azure:

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716253305 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/HoodieSparkTable.java ## @@ -66,4 +77,34 @@ protected

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 ## CI report: * 80c18c266ec08a5021dcc3fba279f7fcd68b75b7 Azure:

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716251519 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/SparkRDDWriteClient.java ## @@ -96,6 +94,11 @@ public

[jira] [Updated] (HUDI-2475) Upgrade downgrade infra for enabling metadata

2021-09-26 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-2475: -- Description: Upgrade downgrade infra for enabling metadata.   If user is having a

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716251124 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -401,64 +394,83 @@

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716249586 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -401,64 +394,83 @@

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716249444 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -401,64 +394,83 @@

[GitHub] [hudi] nsivabalan commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
nsivabalan commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716249167 ## File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java ## @@ -99,83 +94,94 @@

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 ## CI report: * 9d702ea79b9ca6b6fb9cb83a3ef99269617b204d Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3698: [HUDI-2474] Refreshing timeline for every operation in Hudi

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3698: URL: https://github.com/apache/hudi/pull/3698#issuecomment-924412554 ## CI report: * 9d702ea79b9ca6b6fb9cb83a3ef99269617b204d Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * edf9f3ac812350cca2947c212e7fcfe2c4625bae Azure:

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-09-26 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-864: Priority: Blocker (was: Major) > parquet schema conflict: optional binary (UTF8) is not a group >

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-09-26 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-864: Labels: sev:critical user-support-issues (was: sev:high user-support-issues) > parquet schema conflict:

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-09-26 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-864: Component/s: Spark Integration > parquet schema conflict: optional binary (UTF8) is not a group >

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * edf9f3ac812350cca2947c212e7fcfe2c4625bae Azure:

[jira] [Commented] (HUDI-1307) spark datasource load path format is confused for snapshot and increment read mode

2021-09-26 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420358#comment-17420358 ] Raymond Xu commented on HUDI-1307: -- [~309637554] Any update on this improvement? definitely useful to

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 2a5ef0b75a7b51033621be01d4c7bb0eb360fdb2 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 2a5ef0b75a7b51033621be01d4c7bb0eb360fdb2 Azure:

[GitHub] [hudi] xushiyan merged pull request #3718: [MINOR] Add faq for overriding Hudi jar in EMR cluster

2021-09-26 Thread GitBox
xushiyan merged pull request #3718: URL: https://github.com/apache/hudi/pull/3718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch asf-site updated: [MINOR] Add faq for overriding Hudi jar in EMR cluster (#3718)

2021-09-26 Thread xushiyan
This is an automated email from the ASF dual-hosted git repository. xushiyan pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 37c9b7b [MINOR] Add faq for overriding

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 2a5ef0b75a7b51033621be01d4c7bb0eb360fdb2 Azure:

[GitHub] [hudi] xushiyan commented on a change in pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-09-26 Thread GitBox
xushiyan commented on a change in pull request #3671: URL: https://github.com/apache/hudi/pull/3671#discussion_r716239631 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHiveSchemaProvider.java ## @@ -0,0 +1,125 @@ +/* + * Licensed to the

[GitHub] [hudi] xushiyan edited a comment on issue #3617: [SUPPORT] Hive Sync to Glue throws Failed to read data schema

2021-09-26 Thread GitBox
xushiyan edited a comment on issue #3617: URL: https://github.com/apache/hudi/issues/3617#issuecomment-927345014 @novakov-alexey what is the schema for this dataset? the snippet in the previous comment is the example. -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] xushiyan commented on issue #3617: [SUPPORT] Hive Sync to Glue throws Failed to read data schema

2021-09-26 Thread GitBox
xushiyan commented on issue #3617: URL: https://github.com/apache/hudi/issues/3617#issuecomment-927345014 @novakov-alexey what is the schema for this dataset? the snippet in the previous command is the example. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] vinothchandar commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
vinothchandar commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716234905 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -74,14 +77,11 @@ // Metadata table's

[GitHub] [hudi] vinothchandar commented on a change in pull request #3590: [HUDI-2285][HUDI-2476] Metadata table synchronous design. Rebased and Squashed from pull/3426

2021-09-26 Thread GitBox
vinothchandar commented on a change in pull request #3590: URL: https://github.com/apache/hudi/pull/3590#discussion_r716227697 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -88,6 +91,7 @@

[GitHub] [hudi] hudi-bot edited a comment on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3648: URL: https://github.com/apache/hudi/pull/3648#issuecomment-917784069 ## CI report: * 5022f34c93ba19e6ed7c6829f0f98a1e5afeab49 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3648: URL: https://github.com/apache/hudi/pull/3648#issuecomment-917784069 ## CI report: * 9b4450caee1aef2abc7b34b7220583de4e2addfb Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3648: URL: https://github.com/apache/hudi/pull/3648#issuecomment-917784069 ## CI report: * 9b4450caee1aef2abc7b34b7220583de4e2addfb Azure:

[GitHub] [hudi] fengjian428 commented on pull request #3648: [HUDI-2413] fix Sql source's checkpoint issue

2021-09-26 Thread GitBox
fengjian428 commented on pull request #3648: URL: https://github.com/apache/hudi/pull/3648#issuecomment-927331354 > I think the nicer way would be not have deltastreamer error out (or at-least control this behavior) if there is no checkpoint, but to provide an empty one? This PR adds

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * 2a5ef0b75a7b51033621be01d4c7bb0eb360fdb2 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * cdde3ffa453c5afcdc16992acaeb72355d016755 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3716: [DO_NOT_MERGE][WIP][HUDI-2285]Sync metadata debug

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3716: URL: https://github.com/apache/hudi/pull/3716#issuecomment-927129950 ## CI report: * cdde3ffa453c5afcdc16992acaeb72355d016755 Azure:

[GitHub] [hudi] novakov-alexey commented on issue #3617: [SUPPORT] Hive Sync to Glue throws Failed to read data schema

2021-09-26 Thread GitBox
novakov-alexey commented on issue #3617: URL: https://github.com/apache/hudi/issues/3617#issuecomment-927317516 @xushiyan can you advice how to set a json schema string to the commit file? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] hudi-bot edited a comment on pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3722: URL: https://github.com/apache/hudi/pull/3722#issuecomment-927304519 ## CI report: * edc24addc4d4a201491f70116f5a23d6117131e6 Azure:

[GitHub] [hudi] leesf commented on pull request #3719: [HUDI-2489]Tuning HoodieROTablePathFilter by caching hoodieTableFileSystemView, aiming to reduce unnecessary list/get requests

2021-09-26 Thread GitBox
leesf commented on pull request #3719: URL: https://github.com/apache/hudi/pull/3719#issuecomment-927311933 I am a little curious about why the requests in `Enable metadata table` is larger than `Disable metadata table`, and also what the query response time difference between

[jira] [Resolved] (HUDI-2451) HoodieTableMetaClient The file separator from Window to HDFS is faulty

2021-09-26 Thread yao.zhou (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yao.zhou resolved HUDI-2451. Resolution: Fixed > HoodieTableMetaClient The file separator from Window to HDFS is faulty >

[hudi] branch master updated: [MINOR] Fix typo, 'Kakfa' corrected to 'Kafka' & 'parquest' corrected to 'parquet' (#3717)

2021-09-26 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 36be287 [MINOR] Fix typo,'Kakfa' corrected to

[GitHub] [hudi] leesf merged pull request #3717: [MINOR] Fix typo,'Kakfa' corrected to 'Kafka' & 'parquest' corrected to 'parquet'

2021-09-26 Thread GitBox
leesf merged pull request #3717: URL: https://github.com/apache/hudi/pull/3717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch master updated (aa54655 -> 7e887b5)

2021-09-26 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from aa54655 [HUDI-2451] On windows client with hdfs server for wrong file separator (#3687) add 7e887b5 [MINOR] fix

[GitHub] [hudi] leesf merged pull request #3721: [MINOR] fix typo,'SPAKR' corrected to 'SPARK'

2021-09-26 Thread GitBox
leesf merged pull request #3721: URL: https://github.com/apache/hudi/pull/3721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[hudi] branch master updated: [HUDI-2451] On windows client with hdfs server for wrong file separator (#3687)

2021-09-26 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new aa54655 [HUDI-2451] On windows client with hdfs

[GitHub] [hudi] leesf merged pull request #3687: [HUDI-2451] on windows client with hdfs server for wrong file seperator

2021-09-26 Thread GitBox
leesf merged pull request #3687: URL: https://github.com/apache/hudi/pull/3687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3722: URL: https://github.com/apache/hudi/pull/3722#issuecomment-927304519 ## CI report: * edc24addc4d4a201491f70116f5a23d6117131e6 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-09-26 Thread GitBox
hudi-bot commented on pull request #3722: URL: https://github.com/apache/hudi/pull/3722#issuecomment-927304519 ## CI report: * edc24addc4d4a201491f70116f5a23d6117131e6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run travis`

[jira] [Updated] (HUDI-2491) hoodie.datasource.hive_sync.mode=hms mode is supported in spark writer option

2021-09-26 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-2491: - Labels: pull-request-available (was: ) > hoodie.datasource.hive_sync.mode=hms mode is supported

[GitHub] [hudi] fuyun2024 opened a new pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-09-26 Thread GitBox
fuyun2024 opened a new pull request #3722: URL: https://github.com/apache/hudi/pull/3722 …park writer option ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*

[GitHub] [hudi] hudi-bot edited a comment on pull request #3721: [MINOR] fix typo,'SPAKR' corrected to 'SPARK'

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3721: URL: https://github.com/apache/hudi/pull/3721#issuecomment-927281327 ## CI report: * 0607353c39791ba79af14696a5d89d9dae49a15c Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3721: [MINOR] fix typo,'SPAKR' corrected to 'SPARK'

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3721: URL: https://github.com/apache/hudi/pull/3721#issuecomment-927281327 ## CI report: * 770893514e66ee0c6460679419cf4aa5a76efd88 Azure:

[GitHub] [hudi] hudi-bot edited a comment on pull request #3671: [HUDI-2418] add HiveSchemaProvider

2021-09-26 Thread GitBox
hudi-bot edited a comment on pull request #3671: URL: https://github.com/apache/hudi/pull/3671#issuecomment-920215882 ## CI report: * c305e93d3e7e8028f6fd2bc78be6e2c04e85e184 Azure:

  1   2   >