[jira] [Assigned] (HUDI-1073) Implement skeleton to support multiple clustering strategies

2020-10-05 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish reassigned HUDI-1073: Assignee: satish > Implement skeleton to support multiple clustering strategies >

[jira] [Updated] (HUDI-1072) Reader changes to support clustering and insert overwrite

2020-10-05 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1072: - Status: In Progress (was: Open) > Reader changes to support clustering and insert overwrite >

[jira] [Updated] (HUDI-1072) Reader changes to support clustering and insert overwrite

2020-10-05 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish updated HUDI-1072: - Status: Open (was: New) > Reader changes to support clustering and insert overwrite >

[jira] [Resolved] (HUDI-1072) Reader changes to support clustering and insert overwrite

2020-10-05 Thread satish (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] satish resolved HUDI-1072. -- Resolution: Fixed > Reader changes to support clustering and insert overwrite >

[jira] [Created] (HUDI-1314) Hudi Test Suite : Root node has no dependencies. DagUtils should consider that

2020-10-05 Thread Basanth Roy (Jira)
Basanth Roy created HUDI-1314: - Summary: Hudi Test Suite : Root node has no dependencies. DagUtils should consider that Key: HUDI-1314 URL: https://issues.apache.org/jira/browse/HUDI-1314 Project: Apache

[jira] [Updated] (HUDI-1314) Hudi Test Suite : Root node has no dependencies. DagUtils should consider that

2020-10-05 Thread Basanth Roy (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Basanth Roy updated HUDI-1314: -- Status: Open (was: New) > Hudi Test Suite : Root node has no dependencies. DagUtils should consider

[hudi] branch master updated: [MINOR] Update spark master default to yarn (#2148)

2020-10-05 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new fed01cd [MINOR] Update spark master default to

[GitHub] [hudi] vinothchandar merged pull request #2148: [MINOR] Update spark master default to yarn

2020-10-05 Thread GitBox
vinothchandar merged pull request #2148: URL: https://github.com/apache/hudi/pull/2148 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] rmpifer opened a new pull request #2148: Update spark master default to yarn

2020-10-05 Thread GitBox
rmpifer opened a new pull request #2148: URL: https://github.com/apache/hudi/pull/2148 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] prashantwason commented on a change in pull request #2128: [HUDI-1303] Some improvements for the HUDI Test Suite.

2020-10-05 Thread GitBox
prashantwason commented on a change in pull request #2128: URL: https://github.com/apache/hudi/pull/2128#discussion_r499827933 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/generator/GenericRecordFullPayloadGenerator.java ## @@ -333,23 +312,37 @@

[GitHub] [hudi] prashantwason commented on a change in pull request #2128: [HUDI-1303] Some improvements for the HUDI Test Suite.

2020-10-05 Thread GitBox
prashantwason commented on a change in pull request #2128: URL: https://github.com/apache/hudi/pull/2128#discussion_r499825998 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/generator/GenericRecordFullPayloadGenerator.java ## @@ -205,45 +193,36 @@

[hudi] branch master updated: [HUDI-1203] add port configuration for EmbeddedTimelineService (#2142)

2020-10-05 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new fdae388 [HUDI-1203] add port configuration for

[GitHub] [hudi] vinothchandar merged pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
vinothchandar merged pull request #2142: URL: https://github.com/apache/hudi/pull/2142 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] bradleyhurley commented on issue #2146: Bulk Insert - java.io.NotSerializableException: org.apache.hudi.common.util.RocksDBDAO

2020-10-05 Thread GitBox
bradleyhurley commented on issue #2146: URL: https://github.com/apache/hudi/issues/2146#issuecomment-703784334 Let me see if I can test 0.5.3. I agree its pretty old, but its the latest bundled version provided by AWS. This

[GitHub] [hudi] bvaradar commented on issue #2146: Bulk Insert - java.io.NotSerializableException: org.apache.hudi.common.util.RocksDBDAO

2020-10-05 Thread GitBox
bvaradar commented on issue #2146: URL: https://github.com/apache/hudi/issues/2146#issuecomment-703783191 @bradleyhurley : 0.5.2 is pretty old. Can you use 0.5.3. I checked code in 0.5.3 and you should not be seeing this issue in 0.5.3

[jira] [Updated] (HUDI-1289) Using hbase index in spark hangs in Hudi 0.6.0

2020-10-05 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1289: - Labels: pull-request-available (was: ) > Using hbase index in spark hangs in Hudi 0.6.0 >

[GitHub] [hudi] rmpifer opened a new pull request #2147: [HUDI-1289] Remove shading pattern for hbase dependencies in hudi-spark-bundle

2020-10-05 Thread GitBox
rmpifer opened a new pull request #2147: URL: https://github.com/apache/hudi/pull/2147 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] bradleyhurley opened a new issue #2146: Bulk Insert - java.io.NotSerializableException: org.apache.hudi.common.util.RocksDBDAO

2020-10-05 Thread GitBox
bradleyhurley opened a new issue #2146: URL: https://github.com/apache/hudi/issues/2146 **Describe the problem you faced** When attempting to run the DeltaStreamer in BULK_INSERT mode we are experiencing a ` java.io.NotSerializableException: org.apache.hudi.common.util.RocksDBDAO`

[GitHub] [hudi] satishkotha commented on a change in pull request #2129: [HUDI-1302] Add support for timestamp field in HiveSync

2020-10-05 Thread GitBox
satishkotha commented on a change in pull request #2129: URL: https://github.com/apache/hudi/pull/2129#discussion_r499747610 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java ## @@ -167,7 +173,10 @@ private static String

[GitHub] [hudi] lw309637554 commented on pull request #2125: [HUDI-1301] use spark INCREMENTAL mode query hudi dataset support sch…

2020-10-05 Thread GitBox
lw309637554 commented on pull request #2125: URL: https://github.com/apache/hudi/pull/2125#issuecomment-703763353 > > both schema and data should with that commit. > > This may not always be desired. the user may wish to get the incremental results using the latest schema, as well,

[GitHub] [hudi] bvaradar commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2020-10-05 Thread GitBox
bvaradar commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-703762726 @eigakow : 0.6. hudi bundle comes with shaded hbase jars whereas 0.5.3 does not have it. The version of hbase is : 1.2.3. I am not sure if this is causing the issue ? Can you try

[GitHub] [hudi] lw309637554 commented on a change in pull request #2125: [HUDI-1301] use spark INCREMENTAL mode query hudi dataset support sch…

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2125: URL: https://github.com/apache/hudi/pull/2125#discussion_r499744583 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -82,11 +81,12 @@ class IncrementalRelation(val sqlContext:

[GitHub] [hudi] lw309637554 commented on a change in pull request #2125: [HUDI-1301] use spark INCREMENTAL mode query hudi dataset support sch…

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2125: URL: https://github.com/apache/hudi/pull/2125#discussion_r499744583 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -82,11 +81,12 @@ class IncrementalRelation(val sqlContext:

[GitHub] [hudi] bvaradar commented on issue #2145: [SUPPORT] IOException when querying Hudi data with Hive using LIMIT clause

2020-10-05 Thread GitBox
bvaradar commented on issue #2145: URL: https://github.com/apache/hudi/issues/2145#issuecomment-703736061 THis could be similar to https://github.com/apache/hudi/issues/1962 The location setting in the hive table needs to be checked.

[GitHub] [hudi] sassai opened a new issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2020-10-05 Thread GitBox
sassai opened a new issue #1962: URL: https://github.com/apache/hudi/issues/1962 **Describe the problem you faced** I'm running a spark structured streaming application that reads data from kafka and saves it to a partitioned Hudi MERGE_ON_READ table. Hive sync is enabled and I'm

[GitHub] [hudi] bvaradar commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2020-10-05 Thread GitBox
bvaradar commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-703735047 @sassai : Location is set wrongly. | Location: | abfs://x...@xxx.dfs.core.windows.net/data/hudi/batch/tables/nyc_taxi/address/year=2020/month=10/day=1

[GitHub] [hudi] lw309637554 commented on pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-05 Thread GitBox
lw309637554 commented on pull request #2127: URL: https://github.com/apache/hudi/pull/2127#issuecomment-703727376 @vinothchandar all the comments have resolved. 1. pulling the common structure 2. merge these two catch blocks, and simple log

[GitHub] [hudi] lw309637554 commented on a change in pull request #2127: [HUDI-284] add more test for UpdateSchemaEvolution

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2127: URL: https://github.com/apache/hudi/pull/2127#discussion_r499648617 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ## @@ -254,6 +253,10 @@ public void

[GitHub] [hudi] lw309637554 commented on pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
lw309637554 commented on pull request #2142: URL: https://github.com/apache/hudi/pull/2142#issuecomment-703720646 > Minor comments. LGTM otherwise all the comment have resolved This is an automated message from the

[GitHub] [hudi] sassai commented on issue #1962: [SUPPORT] Unable to filter hudi table in hive on partition column

2020-10-05 Thread GitBox
sassai commented on issue #1962: URL: https://github.com/apache/hudi/issues/1962#issuecomment-703652716 Update: Using `set hive.fetch.task.conversion=none;` within the hive session fixed the issue. This is an

[GitHub] [hudi] sassai opened a new issue #2145: [SUPPORT] IOException when querying Hudi data with Hive using LIMIT clause

2020-10-05 Thread GitBox
sassai opened a new issue #2145: URL: https://github.com/apache/hudi/issues/2145 **Describe the problem you faced** Running a query in Hive on Hudi data using LIMIT clause results in IOException. ```console java.io.IOException: Input path does not exist:

[GitHub] [hudi] lw309637554 commented on a change in pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2142: URL: https://github.com/apache/hudi/pull/2142#discussion_r499550041 ## File path: hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java ## @@ -98,16 +98,42 @@ public

[GitHub] [hudi] lw309637554 commented on a change in pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2142: URL: https://github.com/apache/hudi/pull/2142#discussion_r499549836 ## File path: hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java ## @@ -98,16 +98,42 @@ public

[GitHub] [hudi] lw309637554 commented on a change in pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2142: URL: https://github.com/apache/hudi/pull/2142#discussion_r499548663 ## File path: hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/TimelineService.java ## @@ -98,16 +98,42 @@ public

[GitHub] [hudi] lw309637554 commented on a change in pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2142: URL: https://github.com/apache/hudi/pull/2142#discussion_r499545082 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/embedded/EmbeddedTimelineService.java ## @@ -39,17 +39,19 @@

[GitHub] [hudi] lw309637554 commented on a change in pull request #2142: [HUDI-1203] add port configuration for EmbeddedTimelineService

2020-10-05 Thread GitBox
lw309637554 commented on a change in pull request #2142: URL: https://github.com/apache/hudi/pull/2142#discussion_r499544639 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieClient.java ## @@ -102,7 +102,8 @@ private synchronized

[GitHub] [hudi] Karl-WangSK commented on pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

2020-10-05 Thread GitBox
Karl-WangSK commented on pull request #2106: URL: https://github.com/apache/hudi/pull/2106#issuecomment-703564330 > @Karl-WangSK Can this kind of merging should belong in a separate payload class. I am not sure overloading the existing payload is the right way to go. It has a specific

[GitHub] [hudi] KarthickAN closed issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

2020-10-05 Thread GitBox
KarthickAN closed issue #2144: URL: https://github.com/apache/hudi/issues/2144 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] KarthickAN commented on issue #2144: [SUPPORT] HoodieException: timestamp(Part -timestamp) field not found in record

2020-10-05 Thread GitBox
KarthickAN commented on issue #2144: URL: https://github.com/apache/hudi/issues/2144#issuecomment-703473193 Although error was misleading this is not an issue with hudi. Actual data had null values for the column specified and that caused this issue.

[GitHub] [hudi] n3nash commented on a change in pull request #2129: [HUDI-1302] Add support for timestamp field in HiveSync

2020-10-05 Thread GitBox
n3nash commented on a change in pull request #2129: URL: https://github.com/apache/hudi/pull/2129#discussion_r499383876 ## File path: hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java ## @@ -167,7 +173,10 @@ private static String

[GitHub] [hudi] pratyakshsharma commented on pull request #2093: [HUDI-1200]: fixed NPE in CustomKeyGenerator

2020-10-05 Thread GitBox
pratyakshsharma commented on pull request #2093: URL: https://github.com/apache/hudi/pull/2093#issuecomment-703438659 @bhasudha @vinothchandar please take a pass. This is an automated message from the Apache Git Service. To

[jira] [Comment Edited] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-10-05 Thread cdmikechen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207838#comment-17207838 ] cdmikechen edited comment on HUDI-83 at 10/5/20, 6:42 AM: -- Some codes may have

[jira] [Comment Edited] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-10-05 Thread cdmikechen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207838#comment-17207838 ] cdmikechen edited comment on HUDI-83 at 10/5/20, 6:41 AM: -- Some codes may have

[jira] [Commented] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2020-10-05 Thread cdmikechen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207838#comment-17207838 ] cdmikechen commented on HUDI-83: Some codes may have duplicate parts with HUDI-1302 . I will submit after

[jira] [Commented] (HUDI-1302) Add support for timestamp field in HiveSync

2020-10-05 Thread cdmikechen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207837#comment-17207837 ] cdmikechen commented on HUDI-1302: -- Some codes may have duplicate parts with