[GitHub] [hudi] Karl-WangSK commented on pull request #2096: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-09-22 Thread GitBox
Karl-WangSK commented on pull request #2096: URL: https://github.com/apache/hudi/pull/2096#issuecomment-697155627 @leesf This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [hudi] ivorzhou edited a comment on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2020-09-22 Thread GitBox
ivorzhou edited a comment on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-697058418 > Thank you for creating this PR. At this point, I am not fully convinced if we really need this logic. A missing column in the DataFrame could also mean that column has be

[GitHub] [hudi] ivorzhou commented on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2020-09-22 Thread GitBox
ivorzhou commented on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-697058418 > Thank you for creating this PR. At this point, I am not fully convinced if we really need this logic. A missing column in the DataFrame could also mean that column has been drop

[GitHub] [hudi] ShortFinger commented on issue #2098: [SUPPORT] File does not exisit(parquet) while reading Hudi Table from Spark

2020-09-22 Thread GitBox
ShortFinger commented on issue #2098: URL: https://github.com/apache/hudi/issues/2098#issuecomment-697061454 I have the same situation too. Maybe there is some API or config can prevent Hudi to merge log files to base parquet file. COW table can not do this,I trying to use MOR table t

[GitHub] [hudi] bvaradar commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-22 Thread GitBox
bvaradar commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r493002158 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java ## @@ -251,6 +262,28 @@ private void a

[GitHub] [hudi] leesf commented on pull request #2099: [HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss

2020-09-22 Thread GitBox
leesf commented on pull request #2099: URL: https://github.com/apache/hudi/pull/2099#issuecomment-696461170 @nsivabalan would you please review this PR? This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] umehrot2 merged pull request #2102: [MINOR] Remove useless config for bootstrap integ testing

2020-09-22 Thread GitBox
umehrot2 merged pull request #2102: URL: https://github.com/apache/hudi/pull/2102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] vinothchandar commented on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
vinothchandar commented on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-696429882 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-22 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r492434405 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the A

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-22 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-696926015 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [hudi] vinothchandar merged pull request #2099: [HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss

2020-09-22 Thread GitBox
vinothchandar merged pull request #2099: URL: https://github.com/apache/hudi/pull/2099 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] yanghua commented on pull request #2074: [HUDI-1233] Deltastreamer Kafka consumption delay reporting indicators

2020-09-22 Thread GitBox
yanghua commented on pull request #2074: URL: https://github.com/apache/hudi/pull/2074#issuecomment-697053862 @liujinhui1994 You should fix the conflicts and let the Travis pass. This is an automated message from the Apache G

[GitHub] [hudi] vinothchandar commented on a change in pull request #2093: [HUDI-1200]: fixed NPE in CustomKeyGenerator

2020-09-22 Thread GitBox
vinothchandar commented on a change in pull request #2093: URL: https://github.com/apache/hudi/pull/2093#discussion_r492941322 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java ## @@ -58,6 +59,7 @@ public CustomKeyGenerator(TypedPropert

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #2093: [HUDI-1200]: fixed NPE in CustomKeyGenerator

2020-09-22 Thread GitBox
pratyakshsharma commented on a change in pull request #2093: URL: https://github.com/apache/hudi/pull/2093#discussion_r492971849 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java ## @@ -58,6 +59,7 @@ public CustomKeyGenerator(TypedPrope

[GitHub] [hudi] vinothchandar commented on a change in pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
vinothchandar commented on a change in pull request #2064: URL: https://github.com/apache/hudi/pull/2064#discussion_r492405639 ## File path: hudi-client/src/main/java/org/apache/hudi/metadata/HoodieMetadata.java ## @@ -0,0 +1,272 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] [hudi] hj2016 commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-22 Thread GitBox
hj2016 commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-696637689 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [hudi] eigakow commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-22 Thread GitBox
eigakow commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-696730033 Is hbase used for hudi process? hbase service was added to the cluster only lately for the use of other project, and should not have anything to do with this stream. -

[GitHub] [hudi] wangxianghu commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-22 Thread GitBox
wangxianghu commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-696467101 > My primary motive of suggesting parallelDo model, is to avoid splitting the classes and still reap benefits of parallel execution, provided by each engine. I don't think we a

[GitHub] [hudi] vinothchandar commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-22 Thread GitBox
vinothchandar commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r492449543 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the

[GitHub] [hudi] umehrot2 merged pull request #2087: [HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.f…

2020-09-22 Thread GitBox
umehrot2 merged pull request #2087: URL: https://github.com/apache/hudi/pull/2087 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [hudi] bvaradar merged pull request #2097: [MINOR] Add description to remind users that Hudis docker images have mounted the projects workspace

2020-09-22 Thread GitBox
bvaradar merged pull request #2097: URL: https://github.com/apache/hudi/pull/2097 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] liujinhui1994 commented on a change in pull request #1968: [HUDI-1192] Make create hive database automatically configurable

2020-09-22 Thread GitBox
liujinhui1994 commented on a change in pull request #1968: URL: https://github.com/apache/hudi/pull/1968#discussion_r492448849 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -290,6 +290,7 @@ object DataSourceWriteOptions { val HIVE_ASSU

[jira] [Updated] (HUDI-1138) Re-implement marker files via timeline server

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1138: - Fix Version/s: 0.7.0 > Re-implement marker files via timeline server > ---

[jira] [Assigned] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-818: --- Assignee: (was: lamber-ken) > Optimize the default value of hoodie.memory.merge.max.size o

[jira] [Updated] (HUDI-651) Incremental Query on Hive via Spark SQL does not return expected results

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-651: Fix Version/s: (was: 0.6.1) 0.7.0 > Incremental Query on Hive via Spark SQL do

[jira] [Updated] (HUDI-920) Incremental view on MOR table using Spark Datasource

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-920: Fix Version/s: (was: 0.6.1) 0.7.0 > Incremental view on MOR table using Spark

[jira] [Created] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2020-09-22 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1297: Summary: [Umbrella] Revamp Spark Datasource support using Spark 3 APIs Key: HUDI-1297 URL: https://issues.apache.org/jira/browse/HUDI-1297 Project: Apache Hudi

[jira] [Updated] (HUDI-53) Implement Record level Index to map a record key to a pair #90

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-53?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-53: --- Fix Version/s: 0.7.0 > Implement Record level Index to map a record key to a FileID> pair #90 > ---

[jira] [Created] (HUDI-1296) Implement Spark DataSource using range metadata for file/partition pruning

2020-09-22 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1296: Summary: Implement Spark DataSource using range metadata for file/partition pruning Key: HUDI-1296 URL: https://issues.apache.org/jira/browse/HUDI-1296 Project: Apach

[GitHub] [hudi] hj2016 commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-22 Thread GitBox
hj2016 commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-697096489 Do you use hbase index of hudi? Can you show the configuration of hudi-default.properties? This is an automated message

[jira] [Created] (HUDI-1295) RFC-15: Track bloom filters as a part of metadata table

2020-09-22 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1295: Summary: RFC-15: Track bloom filters as a part of metadata table Key: HUDI-1295 URL: https://issues.apache.org/jira/browse/HUDI-1295 Project: Apache Hudi Iss

[jira] [Created] (HUDI-1294) Implement inlining of HFile Data Blocks in metadata table log

2020-09-22 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1294: Summary: Implement inlining of HFile Data Blocks in metadata table log Key: HUDI-1294 URL: https://issues.apache.org/jira/browse/HUDI-1294 Project: Apache Hudi

[jira] [Updated] (HUDI-1256) Follow on improvements to HFile tables

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1256: - Parent: HUDI-1292 Issue Type: Sub-task (was: Improvement) > Follow on improvements to HFi

[jira] [Updated] (HUDI-1256) Follow on improvements to HFile tables

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1256: - Status: Open (was: New) > Follow on improvements to HFile tables > -

[jira] [Assigned] (HUDI-842) RFC-15 : Implementation of File Listing elimination

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-842: --- Assignee: Prashant Wason (was: Vinoth Chandar) > RFC-15 : Implementation of File Listing elim

[jira] [Assigned] (HUDI-842) RFC-15 : Implementation of File Listing elimination

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-842: --- Assignee: Vinoth Chandar (was: Prashant Wason) > RFC-15 : Implementation of File Listing elim

[jira] [Created] (HUDI-1293) RFC-15: Track range metadata as a part of metadata table

2020-09-22 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1293: Summary: RFC-15: Track range metadata as a part of metadata table Key: HUDI-1293 URL: https://issues.apache.org/jira/browse/HUDI-1293 Project: Apache Hudi Is

[jira] [Created] (HUDI-1292) [Umbrella] RFC-15 : File Listing and Query Planning Optimizations

2020-09-22 Thread Vinoth Chandar (Jira)
Vinoth Chandar created HUDI-1292: Summary: [Umbrella] RFC-15 : File Listing and Query Planning Optimizations Key: HUDI-1292 URL: https://issues.apache.org/jira/browse/HUDI-1292 Project: Apache Hudi

[jira] [Updated] (HUDI-1292) [Umbrella] RFC-15 : File Listing and Query Planning Optimizations

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1292: - Status: Open (was: New) > [Umbrella] RFC-15 : File Listing and Query Planning Optimizations > --

[jira] [Resolved] (HUDI-957) Umbrella ticket for sequencing common tasks required to progress/unblock RFC-08, RFC-15 & RFC-19

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-957. - Resolution: Invalid > Umbrella ticket for sequencing common tasks required to progress/unblock > R

[jira] [Updated] (HUDI-957) Umbrella ticket for sequencing common tasks required to progress/unblock RFC-08, RFC-15 & RFC-19

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-957: Status: Open (was: New) > Umbrella ticket for sequencing common tasks required to progress/unblock

[jira] [Commented] (HUDI-957) Umbrella ticket for sequencing common tasks required to progress/unblock RFC-08, RFC-15 & RFC-19

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200475#comment-17200475 ] Vinoth Chandar commented on HUDI-957: - Most of these common abstractions have already l

[jira] [Updated] (HUDI-842) RFC-15 : Implementation of File Listing elimination

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-842: Status: Open (was: New) > RFC-15 : Implementation of File Listing elimination >

[jira] [Updated] (HUDI-842) RFC-15 : Implementation of File Listing elimination

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-842: Status: Patch Available (was: In Progress) > RFC-15 : Implementation of File Listing elimination > -

[jira] [Updated] (HUDI-842) [UMBRELLA] Implementation plan for RFC 15 (File Listing and Query Planning Improvements))

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-842: Summary: [UMBRELLA] Implementation plan for RFC 15 (File Listing and Query Planning Improvements)) (

[jira] [Updated] (HUDI-842) RFC-15 : Implementation of File Listing elimination

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-842: Status: In Progress (was: Open) > RFC-15 : Implementation of File Listing elimination >

[jira] [Updated] (HUDI-842) RFC-15 : Implementation of File Listing elimination

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-842: Summary: RFC-15 : Implementation of File Listing elimination (was: [UMBRELLA] Implementation plan fo

[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-09-22 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200473#comment-17200473 ] Udit Mehrotra commented on HUDI-83: --- [~FelixKJose] I did a quick test. To able to sync tim

[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-09-22 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200466#comment-17200466 ] Udit Mehrotra commented on HUDI-83: --- [~FelixKJose] I have not really tried this on EMR 6.

[GitHub] [hudi] wangxianghu commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-22 Thread GitBox
wangxianghu commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r493120922 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the A

[jira] [Resolved] (HUDI-435) Make async compaction/cleaning extensible to new usages

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-435. - Resolution: Won't Fix > Make async compaction/cleaning extensible to new usages > -

[jira] [Updated] (HUDI-637) Investigate slower hudi queries in S3 vs HDFS

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-637: Fix Version/s: (was: 0.6.1) 0.7.0 > Investigate slower hudi queries in S3 vs H

[jira] [Updated] (HUDI-901) Bug Bash 0.6.0 Tracking Ticket

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-901: Status: Open (was: New) > Bug Bash 0.6.0 Tracking Ticket > -- > >

[jira] [Resolved] (HUDI-901) Bug Bash 0.6.0 Tracking Ticket

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-901. - Resolution: Fixed > Bug Bash 0.6.0 Tracking Ticket > -- > >

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 Remaining work items: - [x] 1. Support for rollbacks in MOR Table - [ ] 2. Rollback of metadata if commit eventually fails on dataset - [x] 3. HUDI-CLI ext

[jira] [Updated] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-845: Fix Version/s: (was: 0.6.1) > Allow parallel writing and move the pending rollback work into clea

[jira] [Updated] (HUDI-86) Add indexing support to the log file format

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-86: --- Fix Version/s: (was: 0.6.1) > Add indexing support to the log file format >

[jira] [Updated] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-845: Fix Version/s: 0.7.0 > Allow parallel writing and move the pending rollback work into cleaner > -

[jira] [Updated] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-83: --- Fix Version/s: (was: 0.6.1) 0.7.0 > Map Timestamp type in spark to corresponding

[jira] [Updated] (HUDI-84) Benchmark write/read paths on Hudi vs non-Hudi datasets

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-84: --- Fix Version/s: (was: 0.6.1) 0.7.0 > Benchmark write/read paths on Hudi vs non-Hud

[jira] [Updated] (HUDI-29) Patch to Hive-sync to enable stats on Hive tables #393

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-29: --- Fix Version/s: (was: 0.6.1) > Patch to Hive-sync to enable stats on Hive tables #393 > -

[jira] [Updated] (HUDI-818) Optimize the default value of hoodie.memory.merge.max.size option

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-818: Fix Version/s: (was: 0.6.1) 0.7.0 > Optimize the default value of hoodie.memor

[jira] [Updated] (HUDI-635) MergeHandle's DiskBasedMap entries can be thinner

2020-09-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-635: Fix Version/s: (was: 0.6.1) > MergeHandle's DiskBasedMap entries can be thinner > ---

[GitHub] [hudi] ShortFinger commented on issue #2098: [SUPPORT] File does not exisit(parquet) while reading Hudi Table from Spark

2020-09-22 Thread GitBox
ShortFinger commented on issue #2098: URL: https://github.com/apache/hudi/issues/2098#issuecomment-697061454 I have the same situation too. Maybe there is some API or config can prevent Hudi to merge log files to base parquet file. COW table can not do this,I trying to use MOR table t

[GitHub] [hudi] ivorzhou edited a comment on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2020-09-22 Thread GitBox
ivorzhou edited a comment on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-697058418 > Thank you for creating this PR. At this point, I am not fully convinced if we really need this logic. A missing column in the DataFrame could also mean that column has be

[GitHub] [hudi] ivorzhou commented on pull request #2091: HUDI-1283 Fill missing columns with default value when spark dataframe save to hudi table

2020-09-22 Thread GitBox
ivorzhou commented on pull request #2091: URL: https://github.com/apache/hudi/pull/2091#issuecomment-697058418 > Thank you for creating this PR. At this point, I am not fully convinced if we really need this logic. A missing column in the DataFrame could also mean that column has been drop

[GitHub] [hudi] yanghua commented on pull request #2074: [HUDI-1233] Deltastreamer Kafka consumption delay reporting indicators

2020-09-22 Thread GitBox
yanghua commented on pull request #2074: URL: https://github.com/apache/hudi/pull/2074#issuecomment-697053862 @liujinhui1994 You should fix the conflicts and let the Travis pass. This is an automated message from the Apache G

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 Remaining work items: - [x] 1. Support for rollbacks in MOR Table - [ ] 2. Rollback of metadata if commit eventually fails on dataset - [x] 3. HUDI-CLI ext

[jira] [Updated] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

2020-09-22 Thread Ryan Pifer (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Pifer updated HUDI-1289: - Description: In Hudi 0.6.0 I can see that there was a change to shade the hbase dependencies in hudi-spar

[GitHub] [hudi] umehrot2 merged pull request #2102: [MINOR] Remove useless config for bootstrap integ testing

2020-09-22 Thread GitBox
umehrot2 merged pull request #2102: URL: https://github.com/apache/hudi/pull/2102 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[hudi] branch master updated (fcc497e -> d37977b)

2020-09-22 Thread uditme
This is an automated email from the ASF dual-hosted git repository. uditme pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from fcc497e [HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss (#2099) add d37977b [MINOR] Re

[GitHub] [hudi] bvaradar commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-22 Thread GitBox
bvaradar commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r493003513 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java ## @@ -738,7 +799,9 @@ private String formatPart

[GitHub] [hudi] bvaradar commented on a change in pull request #2048: [HUDI-1072][WIP] Introduce REPLACE top level action

2020-09-22 Thread GitBox
bvaradar commented on a change in pull request #2048: URL: https://github.com/apache/hudi/pull/2048#discussion_r493002158 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/view/IncrementalTimelineSyncFileSystemView.java ## @@ -251,6 +262,28 @@ private void a

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-22 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-696934366 S3 parquet files ![S3_ParquetFiles](https://user-images.githubusercontent.com/2093096/93928593-8c9e0580-fce8-11ea-9af0-16c5a179a647.jpg) .hoodie files https://user-ima

[GitHub] [hudi] abhijeetkushe commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-09-22 Thread GitBox
abhijeetkushe commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-696926015 @n3nash Apologies for the delayed response.I tried a bunch of heuristics from the available config options for both COW and MOR and I think I got a idea of how the file creation h

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #2093: [HUDI-1200]: fixed NPE in CustomKeyGenerator

2020-09-22 Thread GitBox
pratyakshsharma commented on a change in pull request #2093: URL: https://github.com/apache/hudi/pull/2093#discussion_r492971849 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java ## @@ -58,6 +59,7 @@ public CustomKeyGenerator(TypedPrope

[GitHub] [hudi] zhedoubushishi opened a new pull request #2102: [MINOR] Remove useless config for bootstrap integ testing

2020-09-22 Thread GitBox
zhedoubushishi opened a new pull request #2102: URL: https://github.com/apache/hudi/pull/2102 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [hudi] prashantwason edited a comment on pull request #2064: WIP - [HUDI-842] Implementation of HUDI RFC-15.

2020-09-22 Thread GitBox
prashantwason edited a comment on pull request #2064: URL: https://github.com/apache/hudi/pull/2064#issuecomment-686688968 Remaining work items: - [ ] 1. Support for rollbacks in MOR Table - [x] 2. Rollback of metadata if commit eventually fails on dataset - [x] 3. HUDI-CLI ext

[GitHub] [hudi] vinothchandar commented on a change in pull request #2093: [HUDI-1200]: fixed NPE in CustomKeyGenerator

2020-09-22 Thread GitBox
vinothchandar commented on a change in pull request #2093: URL: https://github.com/apache/hudi/pull/2093#discussion_r492941322 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java ## @@ -58,6 +59,7 @@ public CustomKeyGenerator(TypedPropert

[hudi] branch master updated (8087016 -> fcc497e)

2020-09-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 8087016 [HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.full.input.provider (#2087) add fcc4

[hudi] branch master updated (8087016 -> fcc497e)

2020-09-22 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 8087016 [HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.full.input.provider (#2087) add fcc4

[GitHub] [hudi] vinothchandar merged pull request #2099: [HUDI-1268] fix UpgradeDowngrade fs Rename issue for hdfs and aliyun oss

2020-09-22 Thread GitBox
vinothchandar merged pull request #2099: URL: https://github.com/apache/hudi/pull/2099 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-09-22 Thread GitBox
vinothchandar commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r492872504 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,66 @@ +/* + * Licensed to the

[GitHub] [hudi] eigakow commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-22 Thread GitBox
eigakow commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-696730033 Is hbase used for hudi process? hbase service was added to the cluster only lately for the use of other project, and should not have anything to do with this stream. -

[jira] [Comment Edited] (HUDI-1257) Insert only write operations should preserve duplicate records

2020-09-22 Thread Nicholas Jiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200050#comment-17200050 ] Nicholas Jiang edited comment on HUDI-1257 at 9/22/20, 12:48 PM: ---

[jira] [Comment Edited] (HUDI-1257) Insert only write operations should preserve duplicate records

2020-09-22 Thread Nicholas Jiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200050#comment-17200050 ] Nicholas Jiang edited comment on HUDI-1257 at 9/22/20, 12:47 PM: ---

[jira] [Commented] (HUDI-1257) Insert only write operations should preserve duplicate records

2020-09-22 Thread Nicholas Jiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200050#comment-17200050 ] Nicholas Jiang commented on HUDI-1257: -- [~vbalaji]Do this issue is duplicated with [I

[GitHub] [hudi] hj2016 commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-09-22 Thread GitBox
hj2016 commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-696637689 Seeing the error message is that the hbase verification failed, you can try to put the hbase-site.xml file in the resource and package it for execution. Currently, hbase connection does

[hudi] branch master updated (c8e19e2 -> 8087016)

2020-09-22 Thread uditme
This is an automated email from the ASF dual-hosted git repository. uditme pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from c8e19e2 [HUDI-801] Adding a way to post process schema after it is fetched (#1524) add 8087016 [HUDI-1213] Set

[GitHub] [hudi] umehrot2 merged pull request #2087: [HUDI-1213] Set Default for the bootstrap config : hoodie.bootstrap.f…

2020-09-22 Thread GitBox
umehrot2 merged pull request #2087: URL: https://github.com/apache/hudi/pull/2087 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [hudi] getniz opened a new issue #2101: [SUPPORT]Unable to interpret Child JSON value as a recordkey in Hudi. Any way to interpret that.

2020-09-22 Thread GitBox
getniz opened a new issue #2101: URL: https://github.com/apache/hudi/issues/2101 Issue details: With in a nested JSON data schema with below format is there a way to consume the child object alone ignoring the parent field. It is not recognizing the Child fields of data field for any

[GitHub] [hudi] eigakow opened a new issue #2100: 0.6.0 - using keytab authentication gives issues

2020-09-22 Thread GitBox
eigakow opened a new issue #2100: URL: https://github.com/apache/hudi/issues/2100 **Describe the problem you faced** While uplifting deltastreamer from 0.5.3 to 0.6.0, I can no longer use --principal and --keytab start parameters as it results in error _java.util.ServiceConfigurat

[jira] [Comment Edited] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2020-09-22 Thread Michal Swiatowy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199882#comment-17199882 ] Michal Swiatowy edited comment on HUDI-1288 at 9/22/20, 7:06 AM: ---

[jira] [Commented] (HUDI-1288) DeltaSync:writeToSink fails with Unknown datum type org.apache.avro.JsonProperties$Null

2020-09-22 Thread Michal Swiatowy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199882#comment-17199882 ] Michal Swiatowy commented on HUDI-1288: --- I'm not 100% sure but I think this org.apa