[jira] [Commented] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180979#comment-17180979 ] Nishith Agarwal commented on HUDI-1204: --- After some minor changes due to code refact

[jira] [Comment Edited] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180979#comment-17180979 ] Nishith Agarwal edited comment on HUDI-1204 at 8/20/20, 6:12 AM: ---

[GitHub] [hudi] leesf commented on pull request #1970: [HUDI-1193] Upgrade http dependent version

2020-08-19 Thread GitBox
leesf commented on pull request #1970: URL: https://github.com/apache/hudi/pull/1970#issuecomment-677241299 @liujinhui1994 Hi, I used Hudi wrote to OSS success, so I am curious whether it is the http version cause the exception.

[GitHub] [hudi] wangxianghu commented on pull request #1994: [HUDI-1206]Remove unused variable in Compactor

2020-08-19 Thread GitBox
wangxianghu commented on pull request #1994: URL: https://github.com/apache/hudi/pull/1994#issuecomment-677238670 @yanghua please take a look when free This is an automated message from the Apache Git Service. To respond to t

[jira] [Updated] (HUDI-1206) Remove unused variable in Compactor

2020-08-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1206: - Labels: pull-request-available (was: ) > Remove unused variable in Compactor > --

[GitHub] [hudi] wangxianghu opened a new pull request #1994: [HUDI-1206]Remove unused variable in Compactor

2020-08-19 Thread GitBox
wangxianghu opened a new pull request #1994: URL: https://github.com/apache/hudi/pull/1994 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[jira] [Assigned] (HUDI-1206) Remove unused variable in Compactor

2020-08-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu reassigned HUDI-1206: - Assignee: wangxianghu > Remove unused variable in Compactor > ---

[jira] [Created] (HUDI-1206) Remove unused variable in Compactor

2020-08-19 Thread wangxianghu (Jira)
wangxianghu created HUDI-1206: - Summary: Remove unused variable in Compactor Key: HUDI-1206 URL: https://issues.apache.org/jira/browse/HUDI-1206 Project: Apache Hudi Issue Type: Task

[GitHub] [hudi] tooptoop4 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-19 Thread GitBox
tooptoop4 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-677200382 I use open source presto on ec2 and find native parquet table much faster than hoodie table This is an automated mess

[GitHub] [hudi] wangxianghu commented on pull request #1993: [MINOR]Move HoodieUpgradeDowngradeException to exception package

2020-08-19 Thread GitBox
wangxianghu commented on pull request #1993: URL: https://github.com/apache/hudi/pull/1993#issuecomment-677172386 @yanghua @nsivabalan please take a look when free This is an automated message from the Apache Git Service. To

[GitHub] [hudi] bvaradar commented on issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

2020-08-19 Thread GitBox
bvaradar commented on issue #1823: URL: https://github.com/apache/hudi/issues/1823#issuecomment-677080013 Thanks Gary. This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [hudi] bvaradar closed issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

2020-08-19 Thread GitBox
bvaradar closed issue #1823: URL: https://github.com/apache/hudi/issues/1823 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[jira] [Commented] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180952#comment-17180952 ] Nishith Agarwal commented on HUDI-1204: --- Able to run one of the tests through Junit

[GitHub] [hudi] wangxianghu opened a new pull request #1993: [MINOR]Move HoodieUpgradeDowngradeException to exception package

2020-08-19 Thread GitBox
wangxianghu opened a new pull request #1993: URL: https://github.com/apache/hudi/pull/1993 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[jira] [Updated] (HUDI-1205) Serialization fail when log file is larger than 2GB

2020-08-19 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1205: - Status: Open (was: New) > Serialization fail when log file is larger than 2GB > -

[jira] [Updated] (HUDI-1205) Serialization fail when log file is larger than 2GB

2020-08-19 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1205: - Description: When scanning the log file, if the log file(or log file group) is larger than 2GB, s

[GitHub] [hudi] garyli1019 edited a comment on issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

2020-08-19 Thread GitBox
garyli1019 edited a comment on issue #1823: URL: https://github.com/apache/hudi/issues/1823#issuecomment-677001249 @bvaradar I created a ticket to track this. I think we can close this issue and #1890 https://issues.apache.org/jira/browse/HUDI-1205

[GitHub] [hudi] garyli1019 commented on issue #1823: [SUPPORT] MOR trigger compaction from Hudi CLI

2020-08-19 Thread GitBox
garyli1019 commented on issue #1823: URL: https://github.com/apache/hudi/issues/1823#issuecomment-677001249 @bvaradar I created a ticket to track this. I think we can close this issue and #1890 This is an automated message

[jira] [Created] (HUDI-1205) Serialization fail when log file is larger than 2GB

2020-08-19 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1205: Summary: Serialization fail when log file is larger than 2GB Key: HUDI-1205 URL: https://issues.apache.org/jira/browse/HUDI-1205 Project: Apache Hudi Issue T

svn commit: r41041 - in /dev/hudi/hudi-0.6.0-rc1: ./ hudi-0.6.0-rc1.src.tgz hudi-0.6.0-rc1.src.tgz.asc hudi-0.6.0-rc1.src.tgz.sha512

2020-08-19 Thread bhavanisudha
Author: bhavanisudha Date: Thu Aug 20 03:42:20 2020 New Revision: 41041 Log: Staging source releases for release-0.6.0-rc1 Added: dev/hudi/hudi-0.6.0-rc1/ dev/hudi/hudi-0.6.0-rc1/hudi-0.6.0-rc1.src.tgz (with props) dev/hudi/hudi-0.6.0-rc1/hudi-0.6.0-rc1.src.tgz.asc dev/hudi/hudi

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #375

2020-08-19 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.60 KB...] cdi-api-1.0.jar cdi-api.license commons-cli-1.4.jar commons-cli.license commons-io-2.5.jar commons-io.license commons-lang3-3.5.jar

[hudi] annotated tag release-0.6.0-rc1 updated (62c297e -> 26951d9)

2020-08-19 Thread bhavanisudha
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a change to annotated tag release-0.6.0-rc1 in repository https://gitbox.apache.org/repos/asf/hudi.git. *** WARNING: tag release-0.6.0-rc1 was modified! *** from 62c297e (commit) to 26951d9 (tag)

[GitHub] [hudi] umehrot2 commented on issue #1981: [SUPPORT] Huge performance Difference Between Hudi and Regular Parquet in Athena

2020-08-19 Thread GitBox
umehrot2 commented on issue #1981: URL: https://github.com/apache/hudi/issues/1981#issuecomment-676855516 @vinothchandar @rubenssoto I am thinking this could just be the difference between presto's performance over regular parquet where it completely uses its native parquet readers, vs pre

[jira] [Resolved] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-19 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha resolved HUDI-1177. - Resolution: Fixed > fix TimestampBasedKeyGenerator Task not serializableException > -

[GitHub] [hudi] umehrot2 commented on issue #1977: Error running hudi on aws glue

2020-08-19 Thread GitBox
umehrot2 commented on issue #1977: URL: https://github.com/apache/hudi/issues/1977#issuecomment-676848957 @KarthickAN AWS Glue does not have official support for Hudi. So you may possibly hit runtime issues which you would have to workaround by yourself. As for this particular issue,

[jira] [Commented] (HUDI-1200) CustomKeyGenerator does not work,java.lang.NullPointerException

2020-08-19 Thread liujinhui (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180915#comment-17180915 ] liujinhui commented on HUDI-1200: - Okay, I agree with your suggestion, it really only affe

[jira] [Commented] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180903#comment-17180903 ] Vinoth Chandar commented on HUDI-1204: -- {{This is not correct.}} {{--packages com.da

[GitHub] [hudi] yanghua commented on pull request #1992: [BLOG] Incremental processing on data lakes by vinoyang

2020-08-19 Thread GitBox
yanghua commented on pull request #1992: URL: https://github.com/apache/hudi/pull/1992#issuecomment-676841472 > @yanghua please take a pass and merge when ready. We can then share on social media OK, will review soon.

[hudi] branch release-0.6.0 updated: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator (#1987)

2020-08-19 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch release-0.6.0 in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/release-0.6.0 by this push: new 62c297e [HUDI-1177]: fixed TaskNotS

[jira] [Commented] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180894#comment-17180894 ] Vinoth Chandar commented on HUDI-1204: -- {code} 17:44:15 [incubator-hudi]$ jar tf pac

[GitHub] [hudi] vinothchandar merged pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
vinothchandar merged pull request #1987: URL: https://github.com/apache/hudi/pull/1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[hudi] branch master updated: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator (#1987)

2020-08-19 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new a2312fa [HUDI-1177]: fixed TaskNotSerializableExc

[jira] [Commented] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180892#comment-17180892 ] Vinoth Chandar commented on HUDI-1204: -- [~shivnarayan] I think this is because `hudi-

[jira] [Updated] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1204: - Description: I was trying to run HoodieTestSuiteJob in my local docker set up and ran into dep is

[GitHub] [hudi] vinothchandar commented on pull request #1992: [BLOG] Incremental processing on data lakes by vinoyang

2020-08-19 Thread GitBox
vinothchandar commented on pull request #1992: URL: https://github.com/apache/hudi/pull/1992#issuecomment-676827513 @yanghua please take a pass and merge when ready. We can then share on social media This is an automated m

[GitHub] [hudi] vinothchandar opened a new pull request #1992: [Blog] Incremental processing on data lakes by vinoyang

2020-08-19 Thread GitBox
vinothchandar opened a new pull request #1992: URL: https://github.com/apache/hudi/pull/1992 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of t

[jira] [Updated] (HUDI-1196) Record being placed in incorrect partition during upsert on COW/MOR global indexed tables

2020-08-19 Thread Ryan Pifer (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Pifer updated HUDI-1196: - Description: When upserting a record in a global index table (global and hbase) where a single batch has

[GitHub] [hudi] stackfun commented on issue #1240: future support for multi-client concurrent write?

2020-08-19 Thread GitBox
stackfun commented on issue #1240: URL: https://github.com/apache/hudi/issues/1240#issuecomment-676793348 Is this feature on the roadmap? If so, can you give us an estimate time frame? Thanks! This is an automated message fr

[jira] [Comment Edited] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180850#comment-17180850 ] sivabalan narayanan edited comment on HUDI-1204 at 8/19/20, 10:04 PM: --

[jira] [Commented] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180850#comment-17180850 ] sivabalan narayanan commented on HUDI-1204: --- Steps I followed.  Did some change

[GitHub] [hudi] n3nash merged pull request #1963: [HUDI-1188] Hbase index MOR tables records not being deduplicated

2020-08-19 Thread GitBox
n3nash merged pull request #1963: URL: https://github.com/apache/hudi/pull/1963 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] n3nash commented on a change in pull request #1963: [HUDI-1188] Hbase index MOR tables records not being deduplicated

2020-08-19 Thread GitBox
n3nash commented on a change in pull request #1963: URL: https://github.com/apache/hudi/pull/1963#discussion_r473362342 ## File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java ## @@ -177,13 +176,11 @@ private Get generateStatement(String key) throws

[hudi] branch master updated: Fix HBASE index MOR tables not considering record index valid

2020-08-19 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 1137b0b Fix HBASE index MOR tables not consider

[jira] [Updated] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1204: -- Description: I was trying to run HoodieTestSuiteJob in my local docker set up and ran in

[jira] [Created] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1204: - Summary: NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob Key: HUDI-1204 URL: https://issues.apache.org/jira/browse/HUDI-1204 Pro

[jira] [Assigned] (HUDI-1204) NoClassDefFoundError with AbstractSyncTool while running HoodieTestSuiteJob

2020-08-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-1204: - Assignee: Nishith Agarwal > NoClassDefFoundError with AbstractSyncTool while runn

[GitHub] [hudi] pratyakshsharma closed pull request #1989: [HUDI-1177]: Fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma closed pull request #1989: URL: https://github.com/apache/hudi/pull/1989 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [hudi] pratyakshsharma commented on pull request #1989: [HUDI-1177]: Fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma commented on pull request #1989: URL: https://github.com/apache/hudi/pull/1989#issuecomment-676725359 moved it to https://github.com/apache/hudi/pull/1991 This is an automated message from the Apache Git Servi

[GitHub] [hudi] pratyakshsharma opened a new pull request #1991: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma opened a new pull request #1991: URL: https://github.com/apache/hudi/pull/1991 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] bhasudha commented on pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
bhasudha commented on pull request #1987: URL: https://github.com/apache/hudi/pull/1987#issuecomment-676698639 @pratyakshsharma I was able to verify your patch quickly using quickstart commands. Will wait for the build to pass. ---

[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676688579 @bvaradar that makes sense, thanks. After ran many delete queries, I got a lot small files in S3. Is there a way to merge these small files? Basically I am trying to clean up S3 folder

[GitHub] [hudi] jiegzhan edited a comment on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
jiegzhan edited a comment on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676688579 @bvaradar that makes sense, thanks. After running many delete queries, I got a lot small files in S3. Is there a way to merge these small files? Basically I am trying to clean u

[jira] [Updated] (HUDI-1199) Shade jetty to enable hudi deltastreamer working on databricks runtime

2020-08-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1199: - Labels: pull-request-available (was: ) > Shade jetty to enable hudi deltastreamer working on data

[GitHub] [hudi] pratyakshsharma commented on pull request #1781: [MINOR] Relocate jetty during shading/packaging for Databricks runtime

2020-08-19 Thread GitBox
pratyakshsharma commented on pull request #1781: URL: https://github.com/apache/hudi/pull/1781#issuecomment-676687228 https://github.com/apache/hudi/pull/1990 raised for this. This is an automated message from the Apache Git

[GitHub] [hudi] pratyakshsharma opened a new pull request #1990: [HUDI-1199]: relocated jetty in hudi-utilities-bundle pom

2020-08-19 Thread GitBox
pratyakshsharma opened a new pull request #1990: URL: https://github.com/apache/hudi/pull/1990 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] pratyakshsharma commented on pull request #1988: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma commented on pull request #1988: URL: https://github.com/apache/hudi/pull/1988#issuecomment-676680290 moved it to https://github.com/apache/hudi/pull/1989 This is an automated message from the Apache Git Servi

[GitHub] [hudi] pratyakshsharma opened a new pull request #1989: [HUDI-1177]: Fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma opened a new pull request #1989: URL: https://github.com/apache/hudi/pull/1989 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma commented on a change in pull request #1987: URL: https://github.com/apache/hudi/pull/1987#discussion_r473293742 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java ## @@ -153,7 +147,8 @@ public String getPartitionPath(

[jira] [Commented] (HUDI-1200) CustomKeyGenerator does not work,java.lang.NullPointerException

2020-08-19 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180793#comment-17180793 ] Bhavani Sudha commented on HUDI-1200: - [~liujinhui] since this is not affecting other

[jira] [Updated] (HUDI-1200) CustomKeyGenerator does not work,java.lang.NullPointerException

2020-08-19 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1200: Priority: Minor (was: Blocker) > CustomKeyGenerator does not work,java.lang.NullPointerException >

[jira] [Updated] (HUDI-1200) CustomKeyGenerator does not work,java.lang.NullPointerException

2020-08-19 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavani Sudha updated HUDI-1200: Fix Version/s: (was: 0.6.0) 0.6.1 > CustomKeyGenerator does not work,java.lan

[GitHub] [hudi] bhasudha commented on a change in pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
bhasudha commented on a change in pull request #1987: URL: https://github.com/apache/hudi/pull/1987#discussion_r473289742 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java ## @@ -153,7 +147,8 @@ public String getPartitionPath(Generic

[jira] [Updated] (HUDI-1203) Allow port configuration for EmbeddedTimelineService

2020-08-19 Thread Brian Lindblom (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Lindblom updated HUDI-1203: - Priority: Minor (was: Major) > Allow port configuration for EmbeddedTimelineService > ---

[jira] [Created] (HUDI-1203) Allow port configuration for EmbeddedTimelineService

2020-08-19 Thread Brian Lindblom (Jira)
Brian Lindblom created HUDI-1203: Summary: Allow port configuration for EmbeddedTimelineService Key: HUDI-1203 URL: https://issues.apache.org/jira/browse/HUDI-1203 Project: Apache Hudi Issue

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma commented on a change in pull request #1987: URL: https://github.com/apache/hudi/pull/1987#discussion_r473282621 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/TimestampBasedKeyGenerator.java ## @@ -153,7 +147,8 @@ public String getPartitionPath(

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma commented on a change in pull request #1987: URL: https://github.com/apache/hudi/pull/1987#discussion_r473282800 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/parser/HoodieDateTimeParserImpl.java ## @@ -95,7 +86,15 @@ public String getOutputDate

[GitHub] [hudi] bvaradar commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
bvaradar commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676626002 @jiegzhan : This can happen if you are deleting all/most of records in your dataset. Even if all the records in a file is deleted, Hudi creates a new version of the file - an empty par

[GitHub] [hudi] pratyakshsharma closed pull request #1988: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma closed pull request #1988: URL: https://github.com/apache/hudi/pull/1988 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [hudi] vinothchandar commented on a change in pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
vinothchandar commented on a change in pull request #1987: URL: https://github.com/apache/hudi/pull/1987#discussion_r473257409 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/parser/HoodieDateTimeParserImpl.java ## @@ -95,7 +86,15 @@ public String getOutputDateFo

[GitHub] [hudi] bhasudha commented on a change in pull request #1984: [HUDI-1200] Fix NullPointerException, CustomKeyGenerator does not work

2020-08-19 Thread GitBox
bhasudha commented on a change in pull request #1984: URL: https://github.com/apache/hudi/pull/1984#discussion_r473255513 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/KeyGenerator.java ## @@ -41,7 +41,7 @@ private static final String STRUCT_NAME = "hoodieRow

[GitHub] [hudi] bhasudha commented on issue #1943: [SUPPORT] Gradle fails with dependency on org.apache.hudi:hudi-spark_2.12:0.5.3

2020-08-19 Thread GitBox
bhasudha commented on issue #1943: URL: https://github.com/apache/hudi/issues/1943#issuecomment-676600839 @wfhartford I created a jira ticket here to track this - https://issues.apache.org/jira/browse/HUDI-1202. If you have cycles to fix this please let me know :) --

[jira] [Created] (HUDI-1202) Fix Gradle dependency issue when pulling in hudi-spark_2.12

2020-08-19 Thread Bhavani Sudha (Jira)
Bhavani Sudha created HUDI-1202: --- Summary: Fix Gradle dependency issue when pulling in hudi-spark_2.12 Key: HUDI-1202 URL: https://issues.apache.org/jira/browse/HUDI-1202 Project: Apache Hudi

[GitHub] [hudi] pratyakshsharma commented on pull request #1988: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma commented on pull request #1988: URL: https://github.com/apache/hudi/pull/1988#issuecomment-676600402 Doing that. @bhasudha This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] rmpifer commented on a change in pull request #1963: [HUDI-1188] Hbase index MOR tables records not being deduplicated

2020-08-19 Thread GitBox
rmpifer commented on a change in pull request #1963: URL: https://github.com/apache/hudi/pull/1963#discussion_r473247387 ## File path: hudi-client/src/main/java/org/apache/hudi/index/hbase/HBaseIndex.java ## @@ -177,13 +176,11 @@ private Get generateStatement(String key) throws

[GitHub] [hudi] bhasudha commented on pull request #1988: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
bhasudha commented on pull request #1988: URL: https://github.com/apache/hudi/pull/1988#issuecomment-676595646 @pratyakshsharma could you rebase your change on the release branch - `release-0.6.0` ? This is an automated me

[GitHub] [hudi] pratyakshsharma opened a new pull request #1988: Hudi 1177

2020-08-19 Thread GitBox
pratyakshsharma opened a new pull request #1988: URL: https://github.com/apache/hudi/pull/1988 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[jira] [Updated] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-19 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1177: - Labels: pull-request-available (was: ) > fix TimestampBasedKeyGenerator Task not serializableExc

[GitHub] [hudi] pratyakshsharma opened a new pull request #1987: [HUDI-1177]: fixed TaskNotSerializableException in TimestampBasedKeyGenerator

2020-08-19 Thread GitBox
pratyakshsharma opened a new pull request #1987: URL: https://github.com/apache/hudi/pull/1987 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] jiegzhan commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
jiegzhan commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676544068 @bvaradar What is the size of new version of the same files after running delete query? For me, they are 423KB. Step 1: ran bulk_insert query: ``` df. write.format("o

[jira] [Updated] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1177: - Priority: Blocker (was: Major) > fix TimestampBasedKeyGenerator Task not serializableException >

[jira] [Updated] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1177: - Fix Version/s: (was: 0.6.1) 0.6.0 > fix TimestampBasedKeyGenerator Task no

[jira] [Comment Edited] (HUDI-1154) Hive Sync Partition Extractor not handling decimal types properly

2020-08-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180641#comment-17180641 ] Balaji Varadarajan edited comment on HUDI-1154 at 8/19/20, 3:59 PM:

[jira] [Closed] (HUDI-1154) Hive Sync Partition Extractor not handling decimal types properly

2020-08-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan closed HUDI-1154. > Hive Sync Partition Extractor not handling decimal types properly > --

[jira] [Resolved] (HUDI-1154) Hive Sync Partition Extractor not handling decimal types properly

2020-08-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan resolved HUDI-1154. -- Resolution: Duplicate > Hive Sync Partition Extractor not handling decimal types properl

[jira] [Commented] (HUDI-1154) Hive Sync Partition Extractor not handling decimal types properly

2020-08-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17180641#comment-17180641 ] Balaji Varadarajan commented on HUDI-1154: -- Thans [~linshan]. I will close this t

[GitHub] [hudi] ankur1603 edited a comment on issue #1986: [SUPPORT]: Possiblity to disable preCombine logic

2020-08-19 Thread GitBox
ankur1603 edited a comment on issue #1986: URL: https://github.com/apache/hudi/issues/1986#issuecomment-676506320 Thanks @bvaradar . That was helpful. I was trying to add the column and alter schema in payload class. This is

[GitHub] [hudi] ankur1603 commented on issue #1986: [SUPPORT]: Possiblity to disable preCombine logic

2020-08-19 Thread GitBox
ankur1603 commented on issue #1986: URL: https://github.com/apache/hudi/issues/1986#issuecomment-676506320 Thanks Balaji. That was helpful. I was trying to add the column and alter schema in payload class. This is an automat

[GitHub] [hudi] bvaradar closed issue #1961: [SUPPORT] Jetty Not able to find method java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V on Databricks cluster

2020-08-19 Thread GitBox
bvaradar closed issue #1961: URL: https://github.com/apache/hudi/issues/1961 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[jira] [Assigned] (HUDI-1177) fix TimestampBasedKeyGenerator Task not serializableException

2020-08-19 Thread Pratyaksh Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyaksh Sharma reassigned HUDI-1177: -- Assignee: Pratyaksh Sharma (was: liujinhui) > fix TimestampBasedKeyGenerator Task not

[GitHub] [hudi] bvaradar commented on issue #1961: [SUPPORT] Jetty Not able to find method java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V on Databricks c

2020-08-19 Thread GitBox
bvaradar commented on issue #1961: URL: https://github.com/apache/hudi/issues/1961#issuecomment-676504468 Thanks a lot @nagacse for answering the query. @saumyasuhagiya : Hope this is clarified. Please reopen if that is not the case. --

[GitHub] [hudi] bvaradar commented on issue #1982: [SUPPORT] Not able to write to ADLS Gen2 in Azure Databricks, with error has invalid authority.

2020-08-19 Thread GitBox
bvaradar commented on issue #1982: URL: https://github.com/apache/hudi/issues/1982#issuecomment-676498672 @Ac-Rush : Hudi lets Hadoop FileSystem framework (FileSystem.get()) instantiate specific file-system objects. Is there anything special we need to do to support ADLS Gen2 ?

[GitHub] [hudi] bvaradar commented on issue #1979: [SUPPORT]: Is it possible to incrementally read only upserted rows where a material change has occurred?

2020-08-19 Thread GitBox
bvaradar commented on issue #1979: URL: https://github.com/apache/hudi/issues/1979#issuecomment-676490788 One option to make this to work currently is to add columns that gets updated also as part of the composite record key. We can use key uniqueness constraint of Hudi to achieve the res

[GitHub] [hudi] bvaradar commented on issue #1980: [SUPPORT] Small files (423KB) generated after running delete query

2020-08-19 Thread GitBox
bvaradar commented on issue #1980: URL: https://github.com/apache/hudi/issues/1980#issuecomment-676454679 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [hudi] bvaradar commented on issue #1986: [SUPPORT]: Possiblity to disable preCombine logic

2020-08-19 Thread GitBox
bvaradar commented on issue #1986: URL: https://github.com/apache/hudi/issues/1986#issuecomment-676441083 You can disable precombine using the configs in https://hudi.apache.org/docs/configurations.html#combineInput Regarding the merge logic, Are you using spark.write.format(xxx) to

[GitHub] [hudi] nagacse edited a comment on issue #1961: [SUPPORT] Jetty Not able to find method java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V on Databr

2020-08-19 Thread GitBox
nagacse edited a comment on issue #1961: URL: https://github.com/apache/hudi/issues/1961#issuecomment-676405616 @saumyasuhagiya , The issue has to do with the dependencies. Databricks has a version of jetty server in its runtime libraries https://docs.databricks.com/release-notes/runtime

[GitHub] [hudi] nagacse commented on issue #1961: [SUPPORT] Jetty Not able to find method java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.setHttpOnly(Z)V on Databricks cl

2020-08-19 Thread GitBox
nagacse commented on issue #1961: URL: https://github.com/apache/hudi/issues/1961#issuecomment-676405616 @saumyasuhagiya , The issue has to do with the dependencies. Databricks has a version of jetty server in its runtime libraries https://docs.databricks.com/release-notes/runtime/6.6.ht

[jira] [Created] (HUDI-1201) HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint

2020-08-19 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1201: Summary: HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint Key: HUDI-1201 URL: https://issues.apache.or

[jira] [Updated] (HUDI-1201) HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint

2020-08-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1201: - Status: Open (was: New) > HoodieDeltaStreamer: Allow user overrides to read from earliest

[GitHub] [hudi] bvaradar commented on issue #1985: [SUPPORT]Error while running deltastreamer on top of backfilled data using Hudi

2020-08-19 Thread GitBox
bvaradar commented on issue #1985: URL: https://github.com/apache/hudi/issues/1985#issuecomment-676387943 For switching from (unmanaged)spark.write() to deltastreamer, you need to provide the checkpoint explicitly to deltastreamer either by passing --initial-checkpoint-provider or --checkp

[GitHub] [hudi] ankur1603 opened a new issue #1986: [SUPPORT]: Possiblity to disable precombine logic

2020-08-19 Thread GitBox
ankur1603 opened a new issue #1986: URL: https://github.com/apache/hudi/issues/1986 **Describe the problem you faced** I have a specific scenario for which I am trying to use Apache Hudi. Here is an example to explain the requirement: **Input:** ``` col1,col2,col3,time

  1   2   >