[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569200054 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -923,6 +935,39 @@ public int getMetadataCleane

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569199577 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -230,13 +235,18 @@ protected v

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569198671 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -98,6 +112,7 @@ private transient H

[jira] [Updated] (HUDI-1577) Document that multi-writer cannot be used within the same write client

2021-02-02 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal updated HUDI-1577: -- Parent: HUDI-1456 Issue Type: Sub-task (was: Task) > Document that multi-writer cannot

[jira] [Assigned] (HUDI-1577) Document that multi-writer cannot be used within the same write client

2021-02-02 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-1577: - Assignee: Nishith Agarwal > Document that multi-writer cannot be used within the same wri

[jira] [Created] (HUDI-1577) Document that multi-writer cannot be used within the same write client

2021-02-02 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1577: - Summary: Document that multi-writer cannot be used within the same write client Key: HUDI-1577 URL: https://issues.apache.org/jira/browse/HUDI-1577 Project: Apache

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569196481 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -403,30 +439,32 @@ protected void post

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569191295 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/TableService.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Created] (HUDI-1576) Add ability to perform archival synchronously

2021-02-02 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1576: - Summary: Add ability to perform archival synchronously Key: HUDI-1576 URL: https://issues.apache.org/jira/browse/HUDI-1576 Project: Apache Hudi Issue Type:

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569191295 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/TableService.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569194229 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -599,6 +637,7 @@ public HoodieRestoreM

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569193815 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -403,30 +439,32 @@ protected void post

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569192781 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -220,7 +253,7 @@ void emitCommitMetric

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569191813 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/TableService.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569191295 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/TableService.java ## @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] n3nash commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r569189393 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/WriteConcurrencyMode.java ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Fo

[GitHub] [hudi] n3nash edited a comment on pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash edited a comment on pull request #2359: URL: https://github.com/apache/hudi/pull/2359#issuecomment-772301128 > The concern I had was the part 2 where, a committed write could have been archived and we may end up skipping it. Can you please clarify again how we guard that? By ensurin

[GitHub] [hudi] n3nash commented on pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on pull request #2359: URL: https://github.com/apache/hudi/pull/2359#issuecomment-772301128 > The concern I had was the part 2 where, a committed write could have been archived and we may end up skipping it. Can you please clarify again how we guard that? By ensuring the a

[GitHub] [hudi] vinothchandar commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
vinothchandar commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569166415 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -128,11 +133,26 @@ public Abstr

[GitHub] [hudi] MyLanPangzi opened a new pull request #2526: FIX HUDI-1420

2021-02-02 Thread GitBox
MyLanPangzi opened a new pull request #2526: URL: https://github.com/apache/hudi/pull/2526 FIX https://issues.apache.org/jira/browse/HUDI-1420 This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [hudi] MyLanPangzi commented on pull request #2525: Fix HUDI-1420

2021-02-02 Thread GitBox
MyLanPangzi commented on pull request #2525: URL: https://github.com/apache/hudi/pull/2525#issuecomment-772268874 find some bug This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [hudi] MyLanPangzi closed pull request #2525: Fix HUDI-1420

2021-02-02 Thread GitBox
MyLanPangzi closed pull request #2525: URL: https://github.com/apache/hudi/pull/2525 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Updated] (HUDI-1420) HoodieTableMetaClient.getMarkerFolderPath works incorrectly on windows client with hdfs server for wrong file seperator

2021-02-02 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1420: - Labels: pull-request-available (was: ) > HoodieTableMetaClient.getMarkerFolderPath works incorrec

[GitHub] [hudi] MyLanPangzi opened a new pull request #2525: Fix HUDI-1420

2021-02-02 Thread GitBox
MyLanPangzi opened a new pull request #2525: URL: https://github.com/apache/hudi/pull/2525 Fix [HUDI-1420](https://issues.apache.org/jira/browse/HUDI-1420) This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] n3nash commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569156786 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java ## @@ -0,0 +1,265 @@ +/* + * Licensed

[GitHub] [hudi] n3nash commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569155127 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HeartbeatUtils.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] [hudi] n3nash commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569154776 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/HeartbeatUtils.java ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundatio

[GitHub] [hudi] n3nash commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569154530 ## File path: hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/heartbeat/TestHoodieHeartbeatClient.java ## @@ -0,0 +1,145 @@ +/* + * Licen

[GitHub] [hudi] n3nash commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569150307 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -750,24 +767,49 @@ private HoodieTimel

[GitHub] [hudi] n3nash commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
n3nash commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r569149254 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -128,11 +133,26 @@ public AbstractHood

[GitHub] [hudi] zuyanton commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-02-02 Thread GitBox
zuyanton commented on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-772225257 @satishkotha I added that parameter to my example, now after writing data into s3 , when I run ```spark.sql("describe testTable3").show``` I get ``` ++--

[GitHub] [hudi] ZhangChaoming opened a new pull request #2524: Fix bug with logic error for checking state condition.

2021-02-02 Thread GitBox
ZhangChaoming opened a new pull request #2524: URL: https://github.com/apache/hudi/pull/2524 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of t

[GitHub] [hudi] garyli1019 commented on a change in pull request #2506: [HUDI-1557] Make Flink write pipeline write task scalable

2021-02-02 Thread GitBox
garyli1019 commented on a change in pull request #2506: URL: https://github.com/apache/hudi/pull/2506#discussion_r569099335 ## File path: hudi-flink/src/main/java/org/apache/hudi/operator/partitioner/BucketAssignerFunction.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apac

[GitHub] [hudi] garyli1019 commented on a change in pull request #2521: [HUDI-1547] CI intermittent failure: TestJsonStringToHoodieRecordMapF…

2021-02-02 Thread GitBox
garyli1019 commented on a change in pull request #2521: URL: https://github.com/apache/hudi/pull/2521#discussion_r569094206 ## File path: hudi-flink/src/test/java/org/apache/hudi/source/TestJsonStringToHoodieRecordMapFunction.java ## @@ -72,7 +72,7 @@ public void testMapFuncti

[jira] [Commented] (HUDI-1505) Allow pluggable option to write error records to side table, queue

2021-02-02 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277639#comment-17277639 ] Raymond Xu commented on HUDI-1505: -- [~vinoth] sorry I'm not actively on that feature. >

[jira] [Updated] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

2021-02-02 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-1527: -- Description: To read the hudi table, you need to specify the path, but the path is not only the tablePa

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-02-02 Thread GitBox
pengzhiwei2018 commented on a change in pull request #2485: URL: https://github.com/apache/hudi/pull/2485#discussion_r569077912 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieSourceOffset.scala ## @@ -0,0 +1,69 @@ +/* +

[jira] [Commented] (HUDI-492) show env all CLI command can not work in hudi-cli

2021-02-02 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277623#comment-17277623 ] vinoyang commented on HUDI-492: --- [~shivnarayan] It has been fixed, IMO, we can close it now.

[GitHub] [hudi] xushiyan opened a new pull request #2523: add latency info

2021-02-02 Thread GitBox
xushiyan opened a new pull request #2523: URL: https://github.com/apache/hudi/pull/2523 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the pu

[jira] [Updated] (HUDI-492) show env all CLI command can not work in hudi-cli

2021-02-02 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-492: -- Status: Open (was: New) > show env all CLI command can not work in hudi-cli > --

[jira] [Assigned] (HUDI-492) show env all CLI command can not work in hudi-cli

2021-02-02 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned HUDI-492: - Assignee: hong dongdong > show env all CLI command can not work in hudi-cli >

[GitHub] [hudi] xushiyan closed pull request #2523: [DISCUSS] Measure latency by storing event time in WriteStatus

2021-02-02 Thread GitBox
xushiyan closed pull request #2523: URL: https://github.com/apache/hudi/pull/2523 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] vinothchandar merged pull request #2458: [MINOR] Rename FileSystemViewHandler to RequestHandler and corrected the class comment

2021-02-02 Thread GitBox
vinothchandar merged pull request #2458: URL: https://github.com/apache/hudi/pull/2458 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-02-02 Thread GitBox
zhedoubushishi commented on a change in pull request #2485: URL: https://github.com/apache/hudi/pull/2485#discussion_r568854277 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/streaming/HoodieSourceOffset.scala ## @@ -0,0 +1,69 @@ +/* +

[GitHub] [hudi] nsivabalan commented on pull request #1929: [HUDI-1160] Support update partial fields for CoW table

2021-02-02 Thread GitBox
nsivabalan commented on pull request #1929: URL: https://github.com/apache/hudi/pull/1929#issuecomment-771700273 @liujinhui1994 : please let me know once you have addressed all comments and rebased w/ latest master. This is

[GitHub] [hudi] garyli1019 commented on a change in pull request #2296: [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpt…

2021-02-02 Thread GitBox
garyli1019 commented on a change in pull request #2296: URL: https://github.com/apache/hudi/pull/2296#discussion_r568693017 ## File path: hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/compact/TestHoodieCompactor.java ## @@ -186,7 +191,7 @@ public voi

[GitHub] [hudi] rubenssoto commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-02-02 Thread GitBox
rubenssoto commented on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-772108742 Great to know, I will test this feature in Athena and Redshift Spectrum, if someone already made this test, please let me know. -

[GitHub] [hudi] vinothchandar commented on a change in pull request #2374: [HUDI-845] Added locking capability to allow multiple writers

2021-02-02 Thread GitBox
vinothchandar commented on a change in pull request #2374: URL: https://github.com/apache/hudi/pull/2374#discussion_r568841211 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -188,6 +203,8 @@ public boolean

[GitHub] [hudi] codecov-io commented on pull request #2519: [HUDI-1573] Spark Sql Writer support Multi preCmp Field

2021-02-02 Thread GitBox
codecov-io commented on pull request #2519: URL: https://github.com/apache/hudi/pull/2519#issuecomment-771782258 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2519?src=pr&el=h1) Report > Merging [#2519](https://codecov.io/gh/apache/hudi/pull/2519?src=pr&el=desc) (f0e16a9) into [ma

[GitHub] [hudi] yanghua commented on a change in pull request #2325: [HUDI-699]Fix CompactionCommand and add unit test for CompactionCommand

2021-02-02 Thread GitBox
yanghua commented on a change in pull request #2325: URL: https://github.com/apache/hudi/pull/2325#discussion_r568534549 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CompactionCommand.java ## @@ -444,9 +455,12 @@ public String validateCompaction( })

[GitHub] [hudi] rubenssoto commented on issue #2463: [SUPPORT] Tuning Hudi Upsert Job

2021-02-02 Thread GitBox
rubenssoto commented on issue #2463: URL: https://github.com/apache/hudi/issues/2463#issuecomment-772111789 No, I think the difference in performance that I see on upsert is because rdd conversion, for example in bulk insert with hoodie.datasource.write.row.writer.enable ON is much faster

[GitHub] [hudi] yanghua merged pull request #2518: [MINOR] Fix method comment typo

2021-02-02 Thread GitBox
yanghua merged pull request #2518: URL: https://github.com/apache/hudi/pull/2518 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] yanghua commented on pull request #2458: [MINOR] Rename FileSystemViewHandler to Router and corrected the class comment

2021-02-02 Thread GitBox
yanghua commented on pull request #2458: URL: https://github.com/apache/hudi/pull/2458#issuecomment-771541383 > I will let you make the RequestHandler vs Router call. and land. `RequestHandler` sounds better. This is

[GitHub] [hudi] yanghua commented on a change in pull request #2443: [HUDI-1269] Make whether the failure of connect hive affects hudi ingest process configurable

2021-02-02 Thread GitBox
yanghua commented on a change in pull request #2443: URL: https://github.com/apache/hudi/pull/2443#discussion_r568523091 ## File path: hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java ## @@ -293,6 +293,8 @@ public static HiveSyncConfig

[GitHub] [hudi] sleapfish opened a new issue #2522: [SUPPORT] Avoid UPSERT unchanged records from source

2021-02-02 Thread GitBox
sleapfish opened a new issue #2522: URL: https://github.com/apache/hudi/issues/2522 **Problem** When the source data set has unchanged rows, Hudi will upsert the target table rows and include those records in the new commit. If you have a CDC/incremental logic where you might have i

[GitHub] [hudi] teeyog commented on a change in pull request #2431: [HUDI-1526] Translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-02-02 Thread GitBox
teeyog commented on a change in pull request #2431: URL: https://github.com/apache/hudi/pull/2431#discussion_r568434336 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala ## @@ -348,4 +351,65 @@ class TestCOWDataSou

[GitHub] [hudi] rubenssoto commented on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-02 Thread GitBox
rubenssoto commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-771603309 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [hudi] wangxianghu commented on a change in pull request #2506: [HUDI-1557] Make Flink write pipeline write task scalable

2021-02-02 Thread GitBox
wangxianghu commented on a change in pull request #2506: URL: https://github.com/apache/hudi/pull/2506#discussion_r568607212 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/index/state/FlinkInMemoryStateIndex.java ## @@ -62,47 +61,14 @@ public FlinkIn

[GitHub] [hudi] satishkotha edited a comment on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-02-02 Thread GitBox
satishkotha edited a comment on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-772020119 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] nsivabalan commented on pull request #2310: [HUDI-1444] fix rollback for emtpy partition table

2021-02-02 Thread GitBox
nsivabalan commented on pull request #2310: URL: https://github.com/apache/hudi/pull/2310#issuecomment-771690219 @Xoln : When you get time, can you please respond to comments and address them if required. This is an automat

[GitHub] [hudi] satishkotha commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-02-02 Thread GitBox
satishkotha commented on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-772020119 Hi If you set support_timestamp property mentioned [here](https://hudi.apache.org/docs/configurations.html#HIVE_SUPPORT_TIMESTAMP), hudi will convert the field to timestamp t

[GitHub] [hudi] sleapfish commented on issue #2284: [SUPPORT] : Is there a option to achieve SCD 2 in Hudi?

2021-02-02 Thread GitBox
sleapfish commented on issue #2284: URL: https://github.com/apache/hudi/issues/2284#issuecomment-771905029 I would also appreciate such feature, I believe it's pretty common use case and having this would make a lot of difference. @bvaradar do you have any examples or if you can point me

[GitHub] [hudi] yanghua merged pull request #2271: [HUDI-1335] Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2021-02-02 Thread GitBox
yanghua merged pull request #2271: URL: https://github.com/apache/hudi/pull/2271 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] yanghua commented on pull request #2445: [MINOR] Callback message add partitionPath Field

2021-02-02 Thread GitBox
yanghua commented on pull request #2445: URL: https://github.com/apache/hudi/pull/2445#issuecomment-771571794 @liujinhui1994 Let's fix the Travis firstly? This is an automated message from the Apache Git Service. To respond t

[GitHub] [hudi] vinothchandar commented on a change in pull request #2359: [HUDI-1486] Remove inflight rollback in hoodie writer

2021-02-02 Thread GitBox
vinothchandar commented on a change in pull request #2359: URL: https://github.com/apache/hudi/pull/2359#discussion_r568815079 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -160,7 +180,8 @@ public boolean

[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2296: [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpt…

2021-02-02 Thread GitBox
pengzhiwei2018 commented on a change in pull request #2296: URL: https://github.com/apache/hudi/pull/2296#discussion_r568677443 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestCOWDataSource.scala ## @@ -320,4 +320,21 @@ class TestCO

[GitHub] [hudi] yanghua commented on pull request #2438: [HUDI-1447] DeltaStreamer kafka source supports consuming from specified timestamp

2021-02-02 Thread GitBox
yanghua commented on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-771572476 CI still failed, check it again. This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [hudi] vinothchandar commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper

2021-02-02 Thread GitBox
vinothchandar commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-771803301 if you are already on 0.6.0, the upgrade step would have run anyway from 0.5.x. No additional steps needed to migrate to 0.7.0. -

[GitHub] [hudi] jtmzheng commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper

2021-02-02 Thread GitBox
jtmzheng commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-771421761 @n3nash I have not had a chance to look at 0.7.0 migration yet, what EMR versions is 0.7.0 compatible with? If my dataset is on 0.6.0 already do I just need to update the Hudi ja

[GitHub] [hudi] n3nash commented on issue #2515: [SUPPORT] ERROR HoodieTimelineArchiveLog: Failed to archive commits

2021-02-02 Thread GitBox
n3nash commented on issue #2515: URL: https://github.com/apache/hudi/issues/2515#issuecomment-771293068 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [hudi] yanghua commented on a change in pull request #2506: [HUDI-1557] Make Flink write pipeline write task scalable

2021-02-02 Thread GitBox
yanghua commented on a change in pull request #2506: URL: https://github.com/apache/hudi/pull/2506#discussion_r568370892 ## File path: hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java ## @@ -249,7 +250,17 @@ public String getLastCo

[GitHub] [hudi] n3nash commented on issue #2439: [SUPPORT] Unable to sync with external hive metastore via metastore uris in the thrift protocol

2021-02-02 Thread GitBox
n3nash commented on issue #2439: URL: https://github.com/apache/hudi/issues/2439#issuecomment-771402960 @rakeshramakrishnan Could you try the above patch from @Trevor-zhang and see if that fixes your issue ? This is an autom

[GitHub] [hudi] n3nash commented on issue #2507: [SUPPORT] Error when Hudi metadata enabled for non partitioned tables

2021-02-02 Thread GitBox
n3nash commented on issue #2507: URL: https://github.com/apache/hudi/issues/2507#issuecomment-771412820 @prashantwason Can you take a look at this ? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] n3nash commented on issue #2470: [SUPPORT] Heavy skew in ListingBasedRollbackHelper

2021-02-02 Thread GitBox
n3nash commented on issue #2470: URL: https://github.com/apache/hudi/issues/2470#issuecomment-771409047 @jtmzheng Does using 0.7.0 and `hoodie.metadata.enable=true` solve the issue ? This is an automated message from the Apa

[GitHub] [hudi] n3nash commented on issue #2463: [SUPPORT] Tuning Hudi Upsert Job

2021-02-02 Thread GitBox
n3nash commented on issue #2463: URL: https://github.com/apache/hudi/issues/2463#issuecomment-771408348 @rubenssoto Did increasing the num executors help ? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] GintokiYs commented on issue #2513: [SUPPORT]Hive-Cli set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat and query error

2021-02-02 Thread GitBox
GintokiYs commented on issue #2513: URL: https://github.com/apache/hudi/issues/2513#issuecomment-771453564 @n3nash Thank you for your reply. When I update the data in the Hudi table, the Hive-Cli query will get two records (the two records have the same primary key), while the Spark-SQL

[GitHub] [hudi] prashantwason commented on a change in pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-02-02 Thread GitBox
prashantwason commented on a change in pull request #2496: URL: https://github.com/apache/hudi/pull/2496#discussion_r568228504 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java ## @@ -118,12 +156,31 @@ private static Registry getMet

[GitHub] [hudi] n3nash commented on issue #2513: [SUPPORT]Hive-Cli set hive.input.format=org.apache.hudi.hadoop.HoodieParquetInputFormat and query error

2021-02-02 Thread GitBox
n3nash commented on issue #2513: URL: https://github.com/apache/hudi/issues/2513#issuecomment-771416134 @GintokiYs You should not set the hive input format that way. You can set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat. As long as your table is registered as a

[GitHub] [hudi] n3nash commented on issue #2508: [SUPPORT] Error upserting bucketType UPDATE for partition

2021-02-02 Thread GitBox
n3nash commented on issue #2508: URL: https://github.com/apache/hudi/issues/2508#issuecomment-771413944 @nsivabalan Do you think this may have something to do with the Encoders needed in the row write path ? This is an autom

[GitHub] [hudi] n3nash commented on issue #2437: deltastreamer fails due to "Error upserting bucketType UPDATE for partition" and ArrayIndexOutOfBoundsException

2021-02-02 Thread GitBox
n3nash commented on issue #2437: URL: https://github.com/apache/hudi/issues/2437#issuecomment-771401741 @jiangok2006 Were you able to run with the setting hoodie.avro.schema.validate=true ? My feeling is this is related schema and decoding of records using the provided schema ---

[GitHub] [hudi] n3nash commented on issue #2409: [SUPPORT] Spark structured Streaming writes to Hudi and synchronizes Hive to create only read-optimized tables without creating real-time tables

2021-02-02 Thread GitBox
n3nash commented on issue #2409: URL: https://github.com/apache/hudi/issues/2409#issuecomment-771397330 @wosow Were you able to resolve your issue ? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] n3nash commented on issue #2406: [SUPPORT] Deltastreamer - Property hoodie.datasource.write.partitionpath.field not found

2021-02-02 Thread GitBox
n3nash commented on issue #2406: URL: https://github.com/apache/hudi/issues/2406#issuecomment-771394795 @SureshK-T2S Is there anything else related to this issue that needs to be discussed further ? This is an automated mess

[jira] [Closed] (HUDI-1335) Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2021-02-02 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1335. -- Assignee: XiangHu Wang (was: wangxianghu#1) Resolution: Done Done via master branch: d74d8e208439df8cb2eb

[jira] [Updated] (HUDI-1335) Introduce FlinkHoodieSimpleIndex to hudi-flink-client

2021-02-02 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-1335: --- Reporter: XiangHu Wang (was: wangxianghu#1) > Introduce FlinkHoodieSimpleIndex to hudi-flink-client > ---

[hudi] branch master updated: [HUDI-1335] Introduce FlinkHoodieSimpleIndex to hudi-flink-client (#2271)

2021-02-02 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d74d8e2 [HUDI-1335] Introduce FlinkHoodieSimple

[GitHub] [hudi] n3nash commented on issue #2448: [SUPPORT] deltacommit for client 172.16.116.102 already exists

2021-02-02 Thread GitBox
n3nash commented on issue #2448: URL: https://github.com/apache/hudi/issues/2448#issuecomment-771406877 @peng-xin Are you able to proceed with `hoodie.compact.inline -> true` and `hoodie.auto.commit -> false` ? This is an au

[GitHub] [hudi] n3nash commented on issue #2461: All records are present in athena query result on glue crawled Hudi tables

2021-02-02 Thread GitBox
n3nash commented on issue #2461: URL: https://github.com/apache/hudi/issues/2461#issuecomment-771407719 @vrtrepp @noobarcitect Are you able to use the hive-sync tool to resolve your issue ? This is an automated message from

[GitHub] [hudi] n3nash commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-02-02 Thread GitBox
n3nash commented on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-771414366 @satishkotha Could you take a look at this one ? This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] codecov-io edited a comment on pull request #2296: [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpt…

2021-02-02 Thread GitBox
codecov-io edited a comment on pull request #2296: URL: https://github.com/apache/hudi/pull/2296#issuecomment-738779135 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-02-02 Thread GitBox
codecov-io edited a comment on pull request #2496: URL: https://github.com/apache/hudi/pull/2496#issuecomment-768170324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] n3nash commented on issue #2489: [SUPPORT]

2021-02-02 Thread GitBox
n3nash commented on issue #2489: URL: https://github.com/apache/hudi/issues/2489#issuecomment-771411421 @Ishg Do you have any update ? This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [hudi] n3nash commented on issue #2490: spark read hudi data from hive

2021-02-02 Thread GitBox
n3nash commented on issue #2490: URL: https://github.com/apache/hudi/issues/2490#issuecomment-771411738 @Ishg Any update ? This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] n3nash commented on issue #2482: [SUPPORT]

2021-02-02 Thread GitBox
n3nash commented on issue #2482: URL: https://github.com/apache/hudi/issues/2482#issuecomment-771410764 @duanyongvictory Were you able to use the latest release 0.7.0 and see if it resolves your issue ? This is an automated

[GitHub] [hudi] codecov-io commented on pull request #2516: [MINOR] Fixing the default value for source ordering field in payload config

2021-02-02 Thread GitBox
codecov-io commented on pull request #2516: URL: https://github.com/apache/hudi/pull/2516#issuecomment-771264229 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2516?src=pr&el=h1) Report > Merging [#2516](https://codecov.io/gh/apache/hudi/pull/2516?src=pr&el=desc) (7c1f105) into [ma

[GitHub] [hudi] rubenssoto commented on issue #2463: [SUPPORT] Tuning Hudi Upsert Job

2021-02-02 Thread GitBox
rubenssoto commented on issue #2463: URL: https://github.com/apache/hudi/issues/2463#issuecomment-772111789 No, I think the difference in performance that I see on upsert is because rdd conversion, for example in bulk insert with hoodie.datasource.write.row.writer.enable ON is much faster

[GitHub] [hudi] rubenssoto commented on issue #2509: [SUPPORT]Hudi saves TimestampType as bigInt

2021-02-02 Thread GitBox
rubenssoto commented on issue #2509: URL: https://github.com/apache/hudi/issues/2509#issuecomment-772108742 Great to know, I will test this feature in Athena and Redshift Spectrum, if someone already made this test, please let me know. -

[jira] [Created] (HUDI-1575) Early detection, last written commit, also check if there are more commits, try to do resolution, and abort.

2021-02-02 Thread Nishith Agarwal (Jira)
Nishith Agarwal created HUDI-1575: - Summary: Early detection, last written commit, also check if there are more commits, try to do resolution, and abort. Key: HUDI-1575 URL: https://issues.apache.org/jira/browse/

[GitHub] [hudi] codecov-io edited a comment on pull request #2296: [HUDI-1425] Performance loss with the additional hoodieRecords.isEmpt…

2021-02-02 Thread GitBox
codecov-io edited a comment on pull request #2296: URL: https://github.com/apache/hudi/pull/2296#issuecomment-738779135 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2296?src=pr&el=h1) Report > Merging [#2296](https://codecov.io/gh/apache/hudi/pull/2296?src=pr&el=desc) (4384cf6) in

[jira] [Commented] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2021-02-02 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277494#comment-17277494 ] Vinoth Chandar commented on HUDI-1574: -- these are the ones that take up most time. if

[jira] [Created] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2021-02-02 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1574: Summary: Trim existing unit tests to finish in much shorter amount of time Key: HUDI-1574 URL: https://issues.apache.org/jira/browse/HUDI-1574 Project: Apache Hudi

  1   2   3   >