[jira] [Updated] (HUDI-1016) [Minor] Code optimization

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated HUDI-1016: Status: Open (was: New) > [Minor] Code optimization > - > > Key: HU

[GitHub] [hudi] shenh062326 commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-641671278 > @shenh062326 : It makes sense to cover other data-types in a single PR. Can you also add them to this PR. Also, Can you let us know what the missing data types are ? T

[GitHub] [hudi] shenh062326 commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1714: URL: https://github.com/apache/hudi/pull/1714#issuecomment-641677856 > I was wondering if there was a way to just throw an exception or make it an Option.. merged.. let's punt on this for now When I try to run HoodieDeltaStreamer with metr

[jira] [Updated] (HUDI-1005) NPE in HoodieWriteClient.clean

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated HUDI-1005: Status: Open (was: New) > NPE in HoodieWriteClient.clean > --- > >

[jira] [Resolved] (HUDI-1005) NPE in HoodieWriteClient.clean

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen resolved HUDI-1005. - Resolution: Fixed > NPE in HoodieWriteClient.clean > --- > >

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #304

2020-06-09 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.42 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties /home/jenkins/tool

[GitHub] [hudi] vinothchandar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r437846366 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java ## @@ -66,4 +67,21 @@ return new HoodieParqu

[GitHub] [hudi] vinothchandar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r437846366 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java ## @@ -66,4 +67,21 @@ return new HoodieParqu

[GitHub] [hudi] vinothchandar commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-641708414 @umehrot2 take a look as well? This is an automated message from the Apache Git Service. To respond to the m

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Priority: Blocker (was: Major) > Test COW : Presto Read Optimized Query with metadata bootst

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Priority: Blocker (was: Major) > Test COW : Presto Realtime Query with metadata bootstrap >

[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-971: Priority: Blocker (was: Major) > Fix HFileBootstrapIndexReader.getIndexedPartitions() return

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Priority: Blocker (was: Major) > For hive-style partitioned source data, partition columns s

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-807: Priority: Blocker (was: Major) > Spark DS Support for incremental queries for bootstrapped t

[jira] [Updated] (HUDI-999) Parallelize listing of Source dataset partitions

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-999: Priority: Blocker (was: Major) > Parallelize listing of Source dataset partitions > ---

[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-806: Priority: Blocker (was: Major) > Implement support for bootstrapping via Spark datasource AP

[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-971: Fix Version/s: 0.6.0 > Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Fix Version/s: 0.6.0 > For hive-style partitioned source data, partition columns synced with

[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-806: Fix Version/s: 0.6.0 > Implement support for bootstrapping via Spark datasource API > ---

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Fix Version/s: 0.6.0 > Test COW : Presto Realtime Query with metadata bootstrap > ---

[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-955: Fix Version/s: 0.6.0 > Test MOR : Presto Read Optimized Query with metadata bootstrap > -

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-807: Fix Version/s: 0.6.0 > Spark DS Support for incremental queries for bootstrapped tables > ---

[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-619: Fix Version/s: 0.6.0 > Investigate and implement mechanism to have hive/presto/sparksql queri

[jira] [Commented] (HUDI-781) Re-design test utilities

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17130049#comment-17130049 ] Nishith Agarwal commented on HUDI-781: -- [~pwason] Can you help with #2 ? Like we talke

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Fix Version/s: 0.6.0 > Test COW : Presto Read Optimized Query with metadata bootstrap > -

[jira] [Updated] (HUDI-999) Parallelize listing of Source dataset partitions

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-999: Fix Version/s: 0.6.0 > Parallelize listing of Source dataset partitions > --

[jira] [Assigned] (HUDI-994) Identify functional tests that are convertible to unit tests with mocks

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-994: Assignee: Prashant Wason > Identify functional tests that are convertible to unit tests with

[jira] [Assigned] (HUDI-1010) Fix the memory leak for hudi-client unit tests

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-1010: - Assignee: Nishith Agarwal > Fix the memory leak for hudi-client unit tests >

[jira] [Created] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1018: Summary: Handle empty checkpoint better in delta streamer Key: HUDI-1018 URL: https://issues.apache.org/jira/browse/HUDI-1018 Project: Apache Hudi Issue Type

[jira] [Updated] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1018: - Component/s: DeltaStreamer > Handle empty checkpoint better in delta streamer > --

[jira] [Updated] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1018: - Status: Open (was: New) > Handle empty checkpoint better in delta streamer >

[GitHub] [hudi] garyli1019 commented on a change in pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
garyli1019 commented on a change in pull request #1719: URL: https://github.com/apache/hudi/pull/1719#discussion_r437841744 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java ## @@ -57,10 +57,10 @@ public AvroKafkaSource(TypedProp

[GitHub] [hudi] bobgalvao opened a new issue #1723: [SUPPORT] - trouble using Apache Hudi with S3.

2020-06-09 Thread GitBox
bobgalvao opened a new issue #1723: URL: https://github.com/apache/hudi/issues/1723 Hi, I'm having a trouble using Apache Hudi with S3. **Steps to reproduce the behavior:** 1. Produce messages to topic Kafka. (2000 records per window on average) 2. Start streaming (sa

<    1   2