[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2260?src=pr&el=h1) Report > Merging [#2260](https://codecov.io/gh/apache/hudi/pull/2260?src=pr&el=desc) (d087f09) in

[GitHub] [hudi] n3nash commented on pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
n3nash commented on pull request #2417: URL: https://github.com/apache/hudi/pull/2417#issuecomment-757432014 @vinothchandar @umehrot2 Added a test class, can you ptal ? This is an automated message from the Apache Git Service

[GitHub] [hudi] n3nash commented on a change in pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
n3nash commented on a change in pull request #2417: URL: https://github.com/apache/hudi/pull/2417#discussion_r554525498 ## File path: hudi-common/src/test/java/org/apache/hudi/metadata/TestFileSystemBackedTableMetadata.java ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache S

[GitHub] [hudi] codecov-io edited a comment on pull request #2260: [HUDI-1381] Schedule compaction based on time elapsed

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2260: URL: https://github.com/apache/hudi/pull/2260#issuecomment-729530724 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2260?src=pr&el=h1) Report > Merging [#2260](https://codecov.io/gh/apache/hudi/pull/2260?src=pr&el=desc) (d087f09) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2421: [HUDI-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2421: URL: https://github.com/apache/hudi/pull/2421#issuecomment-757112911 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=h1) Report > Merging [#2421](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=desc) (c14c6b6) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2421: [HUDI-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2421: URL: https://github.com/apache/hudi/pull/2421#issuecomment-757112911 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=h1) Report > Merging [#2421](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=desc) (c14c6b6) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2421: [HUDI-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2421: URL: https://github.com/apache/hudi/pull/2421#issuecomment-757112911 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=h1) Report > Merging [#2421](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=desc) (c14c6b6) in

[GitHub] [hudi] vinothchandar commented on pull request #2196: [HUDI-1349]spark sql support overwrite use replace action

2021-01-09 Thread GitBox
vinothchandar commented on pull request #2196: URL: https://github.com/apache/hudi/pull/2196#issuecomment-757419558 @lw309637554 @n3nash @satishkotha I have been trying to test master branch and the following change to make `Overwrite` be a INSERT_OVERWRITE_TABLE, adds complexity IM

[hudi] branch master updated: [MINOR] fix spark 3 build for incremental query on MOR (#2425)

2021-01-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 23e93d0 [MINOR] fix spark 3 build for incremental

[GitHub] [hudi] vinothchandar merged pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
vinothchandar merged pull request #2425: URL: https://github.com/apache/hudi/pull/2425 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] Rap70r commented on issue #2416: When will version 0.6.1 be released?

2021-01-09 Thread GitBox
Rap70r commented on issue #2416: URL: https://github.com/apache/hudi/issues/2416#issuecomment-756818455 Hello vinothchandar, Thank you for getting back to me. Yes, I would like to use Hudi with EMR 6.2.0 and Spark 3.0. Thank you --

[GitHub] [hudi] vinothchandar commented on a change in pull request #2422: [WIP] [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
vinothchandar commented on a change in pull request #2422: URL: https://github.com/apache/hudi/pull/2422#discussion_r554455329 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java ## @@ -158,8 +159,7 @@ public CleanPla

[GitHub] [hudi] codecov-io edited a comment on pull request #2415: [MINOR] Sync HUDI-1196 to FlinkWriteHelper

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2415: URL: https://github.com/apache/hudi/pull/2415#issuecomment-756825549 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2417: URL: https://github.com/apache/hudi/pull/2417#issuecomment-756532045 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] wangxianghu commented on pull request #2419: Hudi 1421

2021-01-09 Thread GitBox
wangxianghu commented on pull request #2419: URL: https://github.com/apache/hudi/pull/2419#issuecomment-757100778 Hi @loukey-lj thanks for your contribution please check why ci failed This is an automated message from the

[GitHub] [hudi] vinothchandar commented on a change in pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
vinothchandar commented on a change in pull request #2417: URL: https://github.com/apache/hudi/pull/2417#discussion_r554259024 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java ## @@ -49,12 +60,48 @@ public FileSystemBackedTab

[GitHub] [hudi] vinothchandar commented on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
vinothchandar commented on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-757090817 @n3nash can you shepherd this and land ? @satishkotha is helping out with the other PR. This is an auto

[GitHub] [hudi] vinothchandar commented on pull request #2422: [WIP] [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
vinothchandar commented on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757038688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] lw309637554 commented on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
lw309637554 commented on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-757328474 > @lw309637554 Left a couple of comments, can land after that. @n3nash Have resolve and reply the comments, Thanks. -

[GitHub] [hudi] lw309637554 edited a comment on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
lw309637554 edited a comment on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-757328474 > @lw309637554 Left a couple of comments, can land after that. @n3nash Have resolved and reply the comments, Thanks. -

[GitHub] [hudi] n3nash commented on a change in pull request #2400: Some fixes and enhancements to test suite framework

2021-01-09 Thread GitBox
n3nash commented on a change in pull request #2400: URL: https://github.com/apache/hudi/pull/2400#discussion_r554496508 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -307,7 +307,10 @@ public void refreshTimeline() throw

[GitHub] [hudi] n3nash merged pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
n3nash merged pull request #2379: URL: https://github.com/apache/hudi/pull/2379 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] satishkotha commented on a change in pull request #2422: [WIP] [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
satishkotha commented on a change in pull request #2422: URL: https://github.com/apache/hudi/pull/2422#discussion_r554244115 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java ## @@ -92,13 +95,36 @@ case HoodieTimeline.SAVEPOIN

[GitHub] [hudi] yanghua merged pull request #2415: [MINOR] Sync HUDI-1196 to FlinkWriteHelper

2021-01-09 Thread GitBox
yanghua merged pull request #2415: URL: https://github.com/apache/hudi/pull/2415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] lw309637554 commented on pull request #2418: [HUDI-1266] Add unit test for validating replacecommit rollback

2021-01-09 Thread GitBox
lw309637554 commented on pull request #2418: URL: https://github.com/apache/hudi/pull/2418#issuecomment-756830958 @satishkotha just two minor comment This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] lw309637554 commented on a change in pull request #2418: [HUDI-1266] Add unit test for validating replacecommit rollback

2021-01-09 Thread GitBox
lw309637554 commented on a change in pull request #2418: URL: https://github.com/apache/hudi/pull/2418#discussion_r554024960 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/HoodieClientRollbackTestBase.java ## @@ -96,4 +99,61 @@

[GitHub] [hudi] bvaradar commented on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

2021-01-09 Thread GitBox
bvaradar commented on issue #2338: URL: https://github.com/apache/hudi/issues/2338#issuecomment-757125096 @so-lazy : when you query through spark datasource (not just single file), are you able to see unique record ? val df = spark.read.format("hudi").load("hdfs://hadoop01:9

[GitHub] [hudi] codecov-io commented on pull request #2415: [MINOR] Sync HUDI-1196 to FlinkWriteHelper

2021-01-09 Thread GitBox
codecov-io commented on pull request #2415: URL: https://github.com/apache/hudi/pull/2415#issuecomment-756825549 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2415?src=pr&el=h1) Report > Merging [#2415](https://codecov.io/gh/apache/hudi/pull/2415?src=pr&el=desc) (1dd00dd) into [ma

[GitHub] [hudi] codecov-io edited a comment on pull request #2421: [Hudi-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2421: URL: https://github.com/apache/hudi/pull/2421#issuecomment-757112911 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] umehrot2 commented on a change in pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
umehrot2 commented on a change in pull request #2417: URL: https://github.com/apache/hudi/pull/2417#discussion_r554237558 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -221,8 +221,9 @@ public Hood

[GitHub] [hudi] n3nash commented on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
n3nash commented on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-757396445 @lw309637554 Thanks for promptly responding to the comments. This is an automated message from the Apache Git Servi

[GitHub] [hudi] lw309637554 commented on a change in pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
lw309637554 commented on a change in pull request #2417: URL: https://github.com/apache/hudi/pull/2417#discussion_r554041498 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/FileSystemViewManager.java ## @@ -159,12 +160,12 @@ private static HoodieTable

[GitHub] [hudi] codecov-io edited a comment on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404921 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io commented on pull request #2421: [Hudi-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
codecov-io commented on pull request #2421: URL: https://github.com/apache/hudi/pull/2421#issuecomment-757112911 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=h1) Report > Merging [#2421](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=desc) (0227058) into [ma

[GitHub] [hudi] lw309637554 edited a comment on pull request #2418: [HUDI-1266] Add unit test for validating replacecommit rollback

2021-01-09 Thread GitBox
lw309637554 edited a comment on pull request #2418: URL: https://github.com/apache/hudi/pull/2418#issuecomment-756830958 @satishkotha just two minor comments This is an automated message from the Apache Git Service. To respon

[GitHub] [hudi] codecov-io edited a comment on pull request #2422: [WIP] [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757129226 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] nsivabalan commented on a change in pull request #2421: [WIP][Hudi-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
nsivabalan commented on a change in pull request #2421: URL: https://github.com/apache/hudi/pull/2421#discussion_r554055131 ## File path: hudi-common/src/test/java/org/apache/hudi/common/table/TestTimelineUtils.java ## @@ -181,10 +187,113 @@ public void testRestoreInstants() t

[GitHub] [hudi] codecov-io commented on pull request #2422: [WIP] [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
codecov-io commented on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757129226 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2422?src=pr&el=h1) Report > Merging [#2422](https://codecov.io/gh/apache/hudi/pull/2422?src=pr&el=desc) (dc855b2) into [ma

[GitHub] [hudi] prashantwason commented on a change in pull request #2417: [HUDI-1479] Use HoodieEngineContext to parallelize fetching of partiton paths

2021-01-09 Thread GitBox
prashantwason commented on a change in pull request #2417: URL: https://github.com/apache/hudi/pull/2417#discussion_r554237450 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java ## @@ -49,12 +60,48 @@ public FileSystemBackedTab

[GitHub] [hudi] vinothchandar commented on a change in pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
vinothchandar commented on a change in pull request #1938: URL: https://github.com/apache/hudi/pull/1938#discussion_r554442195 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/MergeOnReadIncrementalRelation.scala ## @@ -0,0 +1,218 @@ +/* + * Licens

[GitHub] [hudi] codecov-io commented on pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
codecov-io commented on pull request #2424: URL: https://github.com/apache/hudi/pull/2424#issuecomment-757403445 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2424?src=pr&el=h1) Report > Merging [#2424](https://codecov.io/gh/apache/hudi/pull/2424?src=pr&el=desc) (9edb00d) into [ma

[GitHub] [hudi] vinothchandar merged pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
vinothchandar merged pull request #1938: URL: https://github.com/apache/hudi/pull/1938 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on a change in pull request #2421: [WIP][Hudi-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
vinothchandar commented on a change in pull request #2421: URL: https://github.com/apache/hudi/pull/2421#discussion_r554079488 ## File path: hudi-common/src/test/java/org/apache/hudi/common/table/TestTimelineUtils.java ## @@ -181,10 +187,113 @@ public void testRestoreInstants(

[GitHub] [hudi] Rap70r closed issue #2416: When will version 0.6.1 be released?

2021-01-09 Thread GitBox
Rap70r closed issue #2416: URL: https://github.com/apache/hudi/issues/2416 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [hudi] satishkotha commented on pull request #2422: [WIP] [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
satishkotha commented on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757110216 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] prashanthvg89 commented on pull request #2069: [WIP][HUDI-945] Cleanup Spillable map files eagerly for DiskBasedMap

2021-01-09 Thread GitBox
prashanthvg89 commented on pull request #2069: URL: https://github.com/apache/hudi/pull/2069#issuecomment-756969267 I saw this failure in Spark streaming job writing to Hudi. Is this still being worked on and is there an ETA for this? Hudi version: 0.6.0 Spark version: 2.4.0 EM

[GitHub] [hudi] bvaradar closed issue #2401: [SUPPORT] error queue for deltastreamer

2021-01-09 Thread GitBox
bvaradar closed issue #2401: URL: https://github.com/apache/hudi/issues/2401 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [hudi] yanghua merged pull request #2420: [HUDI-1514] Avoid raw type use for parameter of Transformer interface

2021-01-09 Thread GitBox
yanghua merged pull request #2420: URL: https://github.com/apache/hudi/pull/2420 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] codecov-io edited a comment on pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2424: URL: https://github.com/apache/hudi/pull/2424#issuecomment-757403445 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] vinothchandar commented on pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
vinothchandar commented on pull request #1938: URL: https://github.com/apache/hudi/pull/1938#issuecomment-757103037 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] garyli1019 commented on a change in pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
garyli1019 commented on a change in pull request #1938: URL: https://github.com/apache/hudi/pull/1938#discussion_r554287262 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java ## @@ -470,4 +471,45 @@ private static HoodieBaseFile

[GitHub] [hudi] codecov-io edited a comment on pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #1938: URL: https://github.com/apache/hudi/pull/1938#issuecomment-752876566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] lw309637554 commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
lw309637554 commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r554403596 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -711,12 +711,30 @@ public void ro

[GitHub] [hudi] bvaradar commented on issue #2409: [SUPPORT] Spark structured Streaming writes to Hudi and synchronizes Hive to create only read-optimized tables without creating real-time tables

2021-01-09 Thread GitBox
bvaradar commented on issue #2409: URL: https://github.com/apache/hudi/issues/2409#issuecomment-757035695 @wosow : The _rt table syncing happens after _ro table and I see an HiveMetaStore exception when updating commit time in the _ro table saying that the table does not exist. This is wei

[GitHub] [hudi] n3nash commented on a change in pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
n3nash commented on a change in pull request #2379: URL: https://github.com/apache/hudi/pull/2379#discussion_r554301213 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -711,12 +711,30 @@ public void rollbac

[GitHub] [hudi] garyli1019 commented on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
garyli1019 commented on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404557 @vinothchandar This PR will fix the Spark 3 build. This `SparkHadoopUtil` is unnecessary and we removed it from MOR snapshot relation too. -

[GitHub] [hudi] vinothchandar merged pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
vinothchandar merged pull request #2422: URL: https://github.com/apache/hudi/pull/2422 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] codecov-io commented on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
codecov-io commented on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404921 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=h1) Report > Merging [#2425](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=desc) (cf8f501) into [ma

[GitHub] [hudi] codecov-io commented on pull request #2420: [HUDI-1514] Avoid raw type use for parameter of interface

2021-01-09 Thread GitBox
codecov-io commented on pull request #2420: URL: https://github.com/apache/hudi/pull/2420#issuecomment-756960638 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2420?src=pr&el=h1) Report > Merging [#2420](https://codecov.io/gh/apache/hudi/pull/2420?src=pr&el=desc) (fb35a0f) into [ma

[GitHub] [hudi] codecov-io edited a comment on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-751244130 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] bvaradar commented on pull request #2069: [WIP][HUDI-945] Cleanup Spillable map files eagerly for DiskBasedMap

2021-01-09 Thread GitBox
bvaradar commented on pull request #2069: URL: https://github.com/apache/hudi/pull/2069#issuecomment-757120127 @prashantwason : Please go ahead and take over this PR to fit in 0.7 timeline. This is an automated message from

[GitHub] [hudi] bvaradar edited a comment on issue #2338: [SUPPORT] MOR table found duplicate and process so slowly

2021-01-09 Thread GitBox
bvaradar edited a comment on issue #2338: URL: https://github.com/apache/hudi/issues/2338#issuecomment-757125096 @so-lazy : when you query through spark datasource (not just single file), are you able to see unique record ? val df = spark.read.format("hudi").load("hdfs://had

[GitHub] [hudi] n3nash commented on pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
n3nash commented on pull request #2424: URL: https://github.com/apache/hudi/pull/2424#issuecomment-757402997 @prashantwason can you review this ? This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [hudi] sam-wmt commented on issue #2423: Performance Issues due to significant Parallel Create-Dir being issued to Azure ADLS_V2

2021-01-09 Thread GitBox
sam-wmt commented on issue #2423: URL: https://github.com/apache/hudi/issues/2423#issuecomment-757356013 https://user-images.githubusercontent.com/67726885/104107035-9774aa00-5287-11eb-9f8d-a43214fe1266.png";> Adding screenshot of operation type stats for 1 day of the workload.

[GitHub] [hudi] garyli1019 commented on pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
garyli1019 commented on pull request #1938: URL: https://github.com/apache/hudi/pull/1938#issuecomment-756781554 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] codecov-io edited a comment on pull request #2375: [HUDI-1332] Introduce FlinkHoodieBloomIndex to hudi-flink-client

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2375: URL: https://github.com/apache/hudi/pull/2375#issuecomment-752011479 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2375?src=pr&el=h1) Report > Merging [#2375](https://codecov.io/gh/apache/hudi/pull/2375?src=pr&el=desc) (095fd3d) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404921 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=h1) Report > Merging [#2425](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=desc) (5ff3061) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404921 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=h1) Report > Merging [#2425](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=desc) (cf8f501) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2424: URL: https://github.com/apache/hudi/pull/2424#issuecomment-757403445 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[jira] [Closed] (HUDI-1399) support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei closed HUDI-1399. --- > support a independent clustering spark job to asynchronously clustering > -

[jira] [Resolved] (HUDI-1399) support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei resolved HUDI-1399. - Resolution: Fixed > support a independent clustering spark job to asynchronously clustering > ---

[jira] [Updated] (HUDI-1399) support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei updated HUDI-1399: Status: Closed (was: Patch Available) > support a independent clustering spark job to asynchronously clustering >

[jira] [Reopened] (HUDI-1399) support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reopened HUDI-1399: - > support a independent clustering spark job to asynchronously clustering > -

[GitHub] [hudi] codecov-io commented on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
codecov-io commented on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404921 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=h1) Report > Merging [#2425](https://codecov.io/gh/apache/hudi/pull/2425?src=pr&el=desc) (cf8f501) into [ma

[GitHub] [hudi] garyli1019 commented on pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
garyli1019 commented on pull request #2425: URL: https://github.com/apache/hudi/pull/2425#issuecomment-757404557 @vinothchandar This PR will fix the Spark 3 build. This `SparkHadoopUtil` is unnecessary and we removed it from MOR snapshot relation too. -

[GitHub] [hudi] garyli1019 commented on pull request #1938: [HUDI-920] Support Incremental query for MOR table

2021-01-09 Thread GitBox
garyli1019 commented on pull request #1938: URL: https://github.com/apache/hudi/pull/1938#issuecomment-757404344 #2425 will fix the build This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [hudi] garyli1019 opened a new pull request #2425: [MINOR] fix spark 3 build for incremental query on MOR

2021-01-09 Thread GitBox
garyli1019 opened a new pull request #2425: URL: https://github.com/apache/hudi/pull/2425 ## What is the purpose of the pull request Fix Spark 3 build ## Committer checklist - [ ] Has a corresponding JIRA in PR title & commit - [ ] Commit message is descriptive

[GitHub] [hudi] codecov-io commented on pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
codecov-io commented on pull request #2424: URL: https://github.com/apache/hudi/pull/2424#issuecomment-757403445 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2424?src=pr&el=h1) Report > Merging [#2424](https://codecov.io/gh/apache/hudi/pull/2424?src=pr&el=desc) (9edb00d) into [ma

[GitHub] [hudi] n3nash commented on pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
n3nash commented on pull request #2424: URL: https://github.com/apache/hudi/pull/2424#issuecomment-757402997 @prashantwason can you review this ? This is an automated message from the Apache Git Service. To respond to the mes

[jira] [Updated] (HUDI-1509) Major performance degradation due to rewriting records with default values

2021-01-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1509: - Labels: pull-request-available (was: ) > Major performance degradation due to rewriting records w

[GitHub] [hudi] n3nash opened a new pull request #2424: [HUDI-1509]: Reverting LinkedHashSet changes to fix performance degradation for large schemas

2021-01-09 Thread GitBox
n3nash opened a new pull request #2424: URL: https://github.com/apache/hudi/pull/2424 Reverting LinkedHashSet changes to combine fields from oldSchema and newSchema in favor of using only new schema for record rewriting ## *Tips* - *Thank you very much for contributing to Apache H

[GitHub] [hudi] n3nash commented on a change in pull request #2400: Some fixes and enhancements to test suite framework

2021-01-09 Thread GitBox
n3nash commented on a change in pull request #2400: URL: https://github.com/apache/hudi/pull/2400#discussion_r554496508 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -307,7 +307,10 @@ public void refreshTimeline() throw

[GitHub] [hudi] n3nash commented on a change in pull request #2400: Some fixes and enhancements to test suite framework

2021-01-09 Thread GitBox
n3nash commented on a change in pull request #2400: URL: https://github.com/apache/hudi/pull/2400#discussion_r554496508 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -307,7 +307,10 @@ public void refreshTimeline() throw

[hudi] branch master updated (65866c4 -> 368c1a8)

2021-01-09 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 65866c4 [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible (#2422) add 368

[GitHub] [hudi] n3nash merged pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
n3nash merged pull request #2379: URL: https://github.com/apache/hudi/pull/2379 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] n3nash commented on pull request #2379: [HUDI-1399] support a independent clustering spark job to asynchronously clustering

2021-01-09 Thread GitBox
n3nash commented on pull request #2379: URL: https://github.com/apache/hudi/pull/2379#issuecomment-757396445 @lw309637554 Thanks for promptly responding to the comments. This is an automated message from the Apache Git Servi

[jira] [Updated] (HUDI-1518) Remove replaced files logic from archival

2021-01-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1518: - Priority: Major (was: Trivial) > Remove replaced files logic from archival >

[jira] [Updated] (HUDI-1518) Remove replaced files logic from archival

2021-01-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1518: - Priority: Minor (was: Major) > Remove replaced files logic from archival > --

[hudi] branch master updated: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible (#2422)

2021-01-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 65866c4 [HUDI-1276] [HUDI-1459] Make Clustering/R

[GitHub] [hudi] vinothchandar merged pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
vinothchandar merged pull request #2422: URL: https://github.com/apache/hudi/pull/2422 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] codecov-io edited a comment on pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757129226 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2422?src=pr&el=h1) Report > Merging [#2422](https://codecov.io/gh/apache/hudi/pull/2422?src=pr&el=desc) (df5ef96) in

[GitHub] [hudi] satishkotha commented on pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
satishkotha commented on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757385601 > Do you think this PR is ready to land after CI passes? @satishkotha > > On finding more places to change. May be just find usages of `HoodieTimeline.DELTA_COMMIT_ACTION

[GitHub] [hudi] vinothchandar commented on pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
vinothchandar commented on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757384616 Do you think this PR is ready to land after CI passes? @satishkotha On finding more places to change. May be just find usages of `HoodieTimeline.DELTA_COMMIT_ACTION` ?

[jira] [Updated] (HUDI-1518) Remove replaced files logic from archival

2021-01-09 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1518: - Description: See https://github.com/apache/hudi/blob/79ec7b4894b997183a6e10fdc19d34f5ab4ea437/hudi-client/hudi-cl

[GitHub] [hudi] codecov-io edited a comment on pull request #2421: [HUDI-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2421: URL: https://github.com/apache/hudi/pull/2421#issuecomment-757112911 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=h1) Report > Merging [#2421](https://codecov.io/gh/apache/hudi/pull/2421?src=pr&el=desc) (0a198d8) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
codecov-io edited a comment on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757129226 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2422?src=pr&el=h1) Report > Merging [#2422](https://codecov.io/gh/apache/hudi/pull/2422?src=pr&el=desc) (df5ef96) in

[GitHub] [hudi] satishkotha commented on pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
satishkotha commented on pull request #2422: URL: https://github.com/apache/hudi/pull/2422#issuecomment-757383826 > @satishkotha This looks good and also much cleaner. > > Only thing to do IMO is to make sure we don't throw an error when archival fails to delete the replaced file gro

[GitHub] [hudi] nsivabalan commented on a change in pull request #2421: [HUDI-1502] MOR rollback and restore support for metadata sync

2021-01-09 Thread GitBox
nsivabalan commented on a change in pull request #2421: URL: https://github.com/apache/hudi/pull/2421#discussion_r554485833 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestMarkerBasedRollbackStrategy.java ## @@ -102,8 +112,10

[GitHub] [hudi] satishkotha commented on a change in pull request #2422: [HUDI-1276] [HUDI-1459] Make Clustering/ReplaceCommit and Metadata table be compatible

2021-01-09 Thread GitBox
satishkotha commented on a change in pull request #2422: URL: https://github.com/apache/hudi/pull/2422#discussion_r554485792 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/rollback/RollbackUtils.java ## @@ -122,6 +122,7 @@ static Hoodie

  1   2   >