[GitHub] [hudi] garyli1019 edited a comment on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
garyli1019 edited a comment on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667402304 ~~Solved. I guess it's because I didn't drop the hudi meta columns when reading from a hudi COW source and write to another folder as MOR. The existing hudi meta columns

[GitHub] [hudi] garyli1019 edited a comment on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
garyli1019 edited a comment on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667465216 This issue happened to me again. Now the cause could be narrowed down. When the log file was larger than `HoodieStorageConfig.LOGFILE_SIZE_MAX_BYTES`(1GB in default), the

[jira] [Updated] (HUDI-1141) Serialization fail when loading two log files

2020-07-31 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1141: - Summary: Serialization fail when loading two log files (was: Serialization fail when loading

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-31 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-667466538 Tested on 100GB MOR table. A few partitions have 100% duplicate upsert log file, the other has parquet files only. For parquet files only partitions, the `SNAPSHOT` query is

[GitHub] [hudi] zherenyu831 commented on issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS

2020-07-31 Thread GitBox
zherenyu831 commented on issue #1895: URL: https://github.com/apache/hudi/issues/1895#issuecomment-667466216 @FelixKJose We also faced same problem before, our solution is convert TIMESTAMP to DOUBLE before upsert/insert, which will like `1596255542.123` MILLIS will be the `.123`

[GitHub] [hudi] garyli1019 commented on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
garyli1019 commented on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667465216 This issue happened to me again. Now the cause could be narrowed down. When the log file was larger than `HoodieStorageConfig.LOGFILE_SIZE_MAX_BYTES`(1GB in default), the log file

[jira] [Created] (HUDI-1141) Serialization fail when loading large log files

2020-07-31 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1141: Summary: Serialization fail when loading large log files Key: HUDI-1141 URL: https://issues.apache.org/jira/browse/HUDI-1141 Project: Apache Hudi Issue

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #356

2020-07-31 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.44 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[GitHub] [hudi] xushiyan opened a new pull request #1896: [DOC] Add more details to IDE setup

2020-07-31 Thread GitBox
xushiyan opened a new pull request #1896: URL: https://github.com/apache/hudi/pull/1896 ![Screen Shot 2020-07-31 at 8 19 25 PM](https://user-images.githubusercontent.com/2701446/89092975-2db4b380-d36b-11ea-9d83-cc2e9ee31889.png) ## Committer checklist - [ ] Has a

[jira] [Assigned] (HUDI-1140) JCommander not passing command line arguments with comma separated values.

2020-07-31 Thread Sreeram Ramji (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreeram Ramji reassigned HUDI-1140: --- Assignee: Sreeram Ramji > JCommander not passing command line arguments with comma separated

[GitHub] [hudi] vinothchandar merged pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

2020-07-31 Thread GitBox
vinothchandar merged pull request #1768: URL: https://github.com/apache/hudi/pull/1768 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[hudi] branch master updated: [HUDI-1054] Several performance fixes during finalizing writes (#1768)

2020-07-31 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new e79fbc0 [HUDI-1054] Several performance fixes

[GitHub] [hudi] xushiyan commented on pull request #1884: [HUDI-995] Use Transformations, Assertions and SchemaTestUtil

2020-07-31 Thread GitBox
xushiyan commented on pull request #1884: URL: https://github.com/apache/hudi/pull/1884#issuecomment-667458092 @yanghua rebased and resolved conflicts. thanks This is an automated message from the Apache Git Service. To

[GitHub] [hudi] vinothchandar commented on a change in pull request #1678: [HUDI-242] Metadata Bootstrap changes

2020-07-31 Thread GitBox
vinothchandar commented on a change in pull request #1678: URL: https://github.com/apache/hudi/pull/1678#discussion_r463907314 ## File path: hudi-client/src/main/java/org/apache/hudi/keygen/KeyGenerator.java ## @@ -0,0 +1,93 @@ +/* + * Licensed to the Apache Software

[hudi] branch master updated (ccd70a7 -> 727f1df)

2020-07-31 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from ccd70a7 [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149) add 727f1df

[GitHub] [hudi] vinothchandar merged pull request #1894: [MINOR] Suppressing spark logs for hudi-integ and hudi-utilities

2020-07-31 Thread GitBox
vinothchandar merged pull request #1894: URL: https://github.com/apache/hudi/pull/1894 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] umehrot2 commented on a change in pull request #1876: [HUDI-242] Support for RFC-12/Bootstrapping of external datasets

2020-07-31 Thread GitBox
umehrot2 commented on a change in pull request #1876: URL: https://github.com/apache/hudi/pull/1876#discussion_r463904220 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java ## @@ -516,4 +528,73 @@ public static Configuration registerFileSystem(Path

[GitHub] [hudi] vinothchandar commented on a change in pull request #1894: [WIP] Suppressing spark logs for hudi-integ and hudi-utilities

2020-07-31 Thread GitBox
vinothchandar commented on a change in pull request #1894: URL: https://github.com/apache/hudi/pull/1894#discussion_r463903236 ## File path: docker/demo/setup_demo_container.sh ## @@ -17,6 +17,7 @@ echo "Copying spark default config and setting up configs" cp

[jira] [Comment Edited] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-31 Thread Abhishek Modi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169214#comment-17169214 ] Abhishek Modi edited comment on HUDI-1117 at 8/1/20, 1:17 AM: -- We've run into

[jira] [Comment Edited] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-31 Thread Abhishek Modi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169214#comment-17169214 ] Abhishek Modi edited comment on HUDI-1117 at 8/1/20, 1:16 AM: -- {{We've run

[jira] [Commented] (HUDI-1117) Add tdunning json library to spark and utilities bundle

2020-07-31 Thread Abhishek Modi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169214#comment-17169214 ] Abhishek Modi commented on HUDI-1117: - We've run into this error with `hive-exec` multiple times at

[GitHub] [hudi] FelixKJose commented on issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

2020-07-31 Thread GitBox
FelixKJose commented on issue #1875: URL: https://github.com/apache/hudi/issues/1875#issuecomment-667433589 @bvaradar How can I mark this issue as resolved? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] FelixKJose commented on issue #1875: EMR + Spark Batch job + HUDI + Hive external Metastore (MySQL RDS Instance) failed with No Suitable Driver

2020-07-31 Thread GitBox
FelixKJose commented on issue #1875: URL: https://github.com/apache/hudi/issues/1875#issuecomment-667433436 I have got this issue resolved. **Solution:** Issue was that the JDBC connector/driver jar was missing in Spark Classpath of the EMR master node. Even though EMR

[GitHub] [hudi] FelixKJose opened a new issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS

2020-07-31 Thread GitBox
FelixKJose opened a new issue #1895: URL: https://github.com/apache/hudi/issues/1895 I am getting an exception Unknown converted type TIMESTAMP_MICROS while querying HUDI Dataset backed by Hive metastore using Presto. **My DF Schema:** ``` schema = StructType().add("_id",

[GitHub] [hudi] leesf commented on pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2020-07-31 Thread GitBox
leesf commented on pull request #1095: URL: https://github.com/apache/hudi/pull/1095#issuecomment-667430846 closing this one in favor of https://github.com/apache/hudi/pull/1726 This is an automated message from the Apache

[GitHub] [hudi] leesf closed pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2020-07-31 Thread GitBox
leesf closed pull request #1095: URL: https://github.com/apache/hudi/pull/1095 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-210) Implement prometheus metrics reporter

2020-07-31 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-210: --- Priority: Blocker (was: Major) > Implement prometheus metrics reporter > - > >

[jira] [Updated] (HUDI-210) Implement prometheus metrics reporter

2020-07-31 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-210: --- Fix Version/s: 0.6.0 > Implement prometheus metrics reporter > - > >

[GitHub] [hudi] leesf commented on pull request #1726: [HUDI-210]Hudi support prometheus

2020-07-31 Thread GitBox
leesf commented on pull request #1726: URL: https://github.com/apache/hudi/pull/1726#issuecomment-667430557 @UZi5136225 would you please rebase to fix the conflicts and upload the metrics result. This is an automated

[GitHub] [hudi] umehrot2 commented on a change in pull request #1876: [HUDI-242] Support for RFC-12/Bootstrapping of external datasets

2020-07-31 Thread GitBox
umehrot2 commented on a change in pull request #1876: URL: https://github.com/apache/hudi/pull/1876#discussion_r463889895 ## File path: hudi-spark/src/test/java/org/apache/hudi/client/TestBootstrap.java ## @@ -0,0 +1,586 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] umehrot2 commented on a change in pull request #1876: [HUDI-242] Support for RFC-12/Bootstrapping of external datasets

2020-07-31 Thread GitBox
umehrot2 commented on a change in pull request #1876: URL: https://github.com/apache/hudi/pull/1876#discussion_r463889083 ## File path: hudi-client/src/main/java/org/apache/hudi/client/bootstrap/BootstrapSourceSchemaProvider.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the

[GitHub] [hudi] zhedoubushishi commented on issue #1586: [SUPPORT] DMS with 2 key example

2020-07-31 Thread GitBox
zhedoubushishi commented on issue #1586: URL: https://github.com/apache/hudi/issues/1586#issuecomment-667423657 > @zhedoubushishi Interesting! Were you able to deep dive into how can we ensure JCommander does not split by comma or any work around we could do? Sure. Will take a look

[GitHub] [hudi] garyli1019 commented on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
garyli1019 commented on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667402304 Solved. I guess it's because I didn't drop the hudi meta columns when reading from a hudi COW source and write to another folder as MOR. The existing hudi meta columns might cause

[GitHub] [hudi] garyli1019 closed issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
garyli1019 closed issue #1890: URL: https://github.com/apache/hudi/issues/1890 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] yihua edited a comment on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
yihua edited a comment on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-667345505 @nsivabalan @n3nash I just recall that Uber internally implements `UserDefinedBulkInsertPartitioner` interface to have custom partitioners so the name change of the

[GitHub] [hudi] yihua commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
yihua commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-667345505 @nsivabalan @n3nash I just recall that Uber internally extends `UserDefinedBulkInsertPartitioner` to have custom partitioners so the name change of the

[GitHub] [hudi] vinothchandar commented on a change in pull request #1894: [WIP] Suppressing spark logs for hudi-integ and hudi-utilities

2020-07-31 Thread GitBox
vinothchandar commented on a change in pull request #1894: URL: https://github.com/apache/hudi/pull/1894#discussion_r463782211 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java ## @@ -31,6 +31,7 @@ import

[GitHub] [hudi] garyli1019 commented on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
garyli1019 commented on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667254308 @zherenyu831 Thanks for sharing your experience! In my `hoodie.properties` file the payload was `org.apache.hudi.common.model.OverwriteWithLatestAvroPayload`. The strange part is

[GitHub] [hudi] afeldman1 commented on issue #143: Tracking ticket for folks to be added to slack group

2020-07-31 Thread GitBox
afeldman1 commented on issue #143: URL: https://github.com/apache/hudi/issues/143#issuecomment-667249720 afeldm...@gmail.com This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] rubenssoto closed issue #1893: [SUPPORT] Hudi is creating a lot of small files

2020-07-31 Thread GitBox
rubenssoto closed issue #1893: URL: https://github.com/apache/hudi/issues/1893 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] rubenssoto commented on issue #1893: [SUPPORT] Hudi is creating a lot of small files

2020-07-31 Thread GitBox
rubenssoto commented on issue #1893: URL: https://github.com/apache/hudi/issues/1893#issuecomment-667246163 The problem was solved using hoodie.copyonwrite.record.size.estimate config. This is an automated message from the

[jira] [Commented] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-07-31 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169034#comment-17169034 ] Udit Mehrotra commented on HUDI-1098: - [~shivnarayan] [~vinoth] thanks for prioritizing this issue.

[GitHub] [hudi] umehrot2 commented on pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

2020-07-31 Thread GitBox
umehrot2 commented on pull request #1768: URL: https://github.com/apache/hudi/pull/1768#issuecomment-667241386 > > Is it okay if I open a JIRA for this and pursue it separately ? > > Sounds good. let's lump this into the JIRA we have for marker file improvements more holistically ?

[jira] [Commented] (HUDI-1138) Re-implement marker files via timeline server

2020-07-31 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17169028#comment-17169028 ] Udit Mehrotra commented on HUDI-1138: - Another potential performance improvement for listing/deletion

[GitHub] [hudi] vinothchandar commented on a change in pull request #1894: [WIP] Suppressing spark logs for hudi-integ and hudi-utilities

2020-07-31 Thread GitBox
vinothchandar commented on a change in pull request #1894: URL: https://github.com/apache/hudi/pull/1894#discussion_r463724384 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java ## @@ -132,6 +134,12 @@ public void init() { await().atMost(60,

[GitHub] [hudi] vinothchandar commented on pull request #1894: [WIP] Suppressing spark logs for hudi-integ and hudi-utilities

2020-07-31 Thread GitBox
vinothchandar commented on pull request #1894: URL: https://github.com/apache/hudi/pull/1894#issuecomment-667228852 if you see, https://api.travis-ci.org/v3/job/713740328/log.txt you will find logs for Spark deltastreamer (the actual code, not tests). I think we can pass the

[jira] [Commented] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-07-31 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168975#comment-17168975 ] sivabalan narayanan commented on HUDI-1098: --- We could ignore the data files not present after a

[GitHub] [hudi] n3nash opened a new pull request #1894: [WIP] Suppressing spark logs for hudi-integ and hudi-utilities

2020-07-31 Thread GitBox
n3nash opened a new pull request #1894: URL: https://github.com/apache/hudi/pull/1894 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] rubenssoto opened a new issue #1893: [SUPPORT] Hudi is creating a lot of small files

2020-07-31 Thread GitBox
rubenssoto opened a new issue #1893: URL: https://github.com/apache/hudi/issues/1893 Hi, how are you? I'm trying to create a hudi dataset from a small dataset, 2gb. I made an insert operation and because of dataset is so small I don't want a partition, I want 2 files of 1gb each

[GitHub] [hudi] nsivabalan merged pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
nsivabalan merged pull request #1149: URL: https://github.com/apache/hudi/pull/1149 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert (#1149)

2020-07-31 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new ccd70a7 [HUDI-472] Introduce configurations

[GitHub] [hudi] thomasrynnehmrc commented on issue #1889: [SUPPORT]

2020-07-31 Thread GitBox
thomasrynnehmrc commented on issue #1889: URL: https://github.com/apache/hudi/issues/1889#issuecomment-667119148 Yes, this has fixed it, thank you. After this change the filtering took 2 minutes instead of 3 hours. I was not able to run against a full rebuild of Hudi, but I

[GitHub] [hudi] leesf commented on a change in pull request #1891: [HUDI-1124] Document the usage of Tencent COSN

2020-07-31 Thread GitBox
leesf commented on a change in pull request #1891: URL: https://github.com/apache/hudi/pull/1891#discussion_r463607513 ## File path: docs/_docs/0_7_cos_filesystem.cn.md ## @@ -0,0 +1,73 @@ +--- +title: COS Filesystem +keywords: hudi, hive, tencent, cos, spark, presto

[GitHub] [hudi] leesf commented on a change in pull request #1891: [HUDI-1124] Document the usage of Tencent COSN

2020-07-31 Thread GitBox
leesf commented on a change in pull request #1891: URL: https://github.com/apache/hudi/pull/1891#discussion_r463607513 ## File path: docs/_docs/0_7_cos_filesystem.cn.md ## @@ -0,0 +1,73 @@ +--- +title: COS Filesystem +keywords: hudi, hive, tencent, cos, spark, presto

[GitHub] [hudi] leesf commented on a change in pull request #1891: [HUDI-1124] Document the usage of Tencent COSN

2020-07-31 Thread GitBox
leesf commented on a change in pull request #1891: URL: https://github.com/apache/hudi/pull/1891#discussion_r463607286 ## File path: docs/_docs/0_7_cos_filesystem.cn.md ## @@ -0,0 +1,73 @@ +--- +title: COS Filesystem +keywords: hudi, hive, tencent, cos, spark, presto

[GitHub] [hudi] leesf commented on a change in pull request #1891: [HUDI-1124] Document the usage of Tencent COSN

2020-07-31 Thread GitBox
leesf commented on a change in pull request #1891: URL: https://github.com/apache/hudi/pull/1891#discussion_r463607092 ## File path: docs/_docs/0_7_cos_filesystem.cn.md ## @@ -0,0 +1,73 @@ +--- +title: COS Filesystem +keywords: hudi, hive, tencent, cos, spark, presto

[GitHub] [hudi] leesf commented on a change in pull request #1891: [HUDI-1124] Document the usage of Tencent COSN

2020-07-31 Thread GitBox
leesf commented on a change in pull request #1891: URL: https://github.com/apache/hudi/pull/1891#discussion_r463606577 ## File path: docs/_docs/0_7_cos_filesystem.cn.md ## @@ -0,0 +1,73 @@ +--- +title: COS Filesystem +keywords: hudi, hive, tencent, cos, spark, presto

[GitHub] [hudi] zherenyu831 opened a new issue #1892: [SUPPORT] Hudi compaction caused OOM problem

2020-07-31 Thread GitBox
zherenyu831 opened a new issue #1892: URL: https://github.com/apache/hudi/issues/1892 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? - Join the mailing list to engage in conversations and get faster

[GitHub] [hudi] zherenyu831 edited a comment on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
zherenyu831 edited a comment on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667103547 We also having it now, the reason is we using custom payload while upsert data, but when we bulk insert at very beginning, we used

[GitHub] [hudi] zherenyu831 commented on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
zherenyu831 commented on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667103547 We also having it now, the reason is we used custom payload while upsert data, but when we bulk insert at very beginning, we used

[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-667082374 I noticed that base interface don't need to be defined as "UserDefined" anymore since its applicable to our own partitioners as well. So, have stripped off "UserDefined" from

[GitHub] [hudi] DeyinZhong commented on pull request #1855: [HUDI-871] Add support for Tencent Cloud Object Storage(COS)

2020-07-31 Thread GitBox
DeyinZhong commented on pull request #1855: URL: https://github.com/apache/hudi/pull/1855#issuecomment-667076680 > > > @DeyinZhong Thanks for your contributing, LGTM, would you please also update the docs(http://hudi.apache.org/docs/cloud.html), the docs branch is asf-site. Please ping me

[GitHub] [hudi] DeyinZhong opened a new pull request #1891: [HUDI-1124] Document the usage of Tencent COSN

2020-07-31 Thread GitBox
DeyinZhong opened a new pull request #1891: URL: https://github.com/apache/hudi/pull/1891 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[jira] [Updated] (HUDI-1124) Document the usage of Tencent COSN

2020-07-31 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1124: - Labels: pull-request-available (was: ) > Document the usage of Tencent COSN >

[GitHub] [hudi] nsivabalan commented on a change in pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

2020-07-31 Thread GitBox
nsivabalan commented on a change in pull request #1868: URL: https://github.com/apache/hudi/pull/1868#discussion_r463553042 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/commit/InsertBucket2CumulativeWeight.java ## @@ -0,0 +1,53 @@ +/* + * Licensed to

[GitHub] [hudi] nsivabalan commented on a change in pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-31 Thread GitBox
nsivabalan commented on a change in pull request #1819: URL: https://github.com/apache/hudi/pull/1819#discussion_r463548033 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java ## @@ -177,6 +178,18 @@ public RawTripTestPayload

[GitHub] [hudi] bvaradar commented on issue #1890: [SUPPORT] Failed to get record from HoodieMergedLogRecordScanner

2020-07-31 Thread GitBox
bvaradar commented on issue #1890: URL: https://github.com/apache/hudi/issues/1890#issuecomment-667062787 @garyli1019 : I am just guessing here -> Could this be due to type of orderingVal ? Is that (de)serializable with Kryo? Can you try to see by writing a quick test if you can serialize

[GitHub] [hudi] bvaradar commented on issue #1889: [SUPPORT]

2020-07-31 Thread GitBox
bvaradar commented on issue #1889: URL: https://github.com/apache/hudi/issues/1889#issuecomment-667059345 @thomasrynnehmrc : Thanks for opening the ticket. This is a known bottleneck with S3. We will be targeting listing in general with our next release after 0.6 release

[GitHub] [hudi] bvaradar commented on issue #1586: [SUPPORT] DMS with 2 key example

2020-07-31 Thread GitBox
bvaradar commented on issue #1586: URL: https://github.com/apache/hudi/issues/1586#issuecomment-667037723 Opened https://issues.apache.org/jira/browse/HUDI-1140 This is an automated message from the Apache Git Service. To

[jira] [Updated] (HUDI-1140) JCommander not passing command line arguments with comma separated values.

2020-07-31 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1140: - Status: Open (was: New) > JCommander not passing command line arguments with comma

[jira] [Created] (HUDI-1140) JCommander not passing command line arguments with comma separated values.

2020-07-31 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1140: Summary: JCommander not passing command line arguments with comma separated values. Key: HUDI-1140 URL: https://issues.apache.org/jira/browse/HUDI-1140

[GitHub] [hudi] bvaradar commented on issue #1835: [SUPPORT] HoodieDeltaStreamer with Json as Source

2020-07-31 Thread GitBox
bvaradar commented on issue #1835: URL: https://github.com/apache/hudi/issues/1835#issuecomment-667035480 The marking is based on my suspicions of the root-cause. I have not seen this issue arise out of any other case. The integration tests which covers this code path does work fine

[GitHub] [hudi] bvaradar commented on a change in pull request #1888: HUDI-1129: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

2020-07-31 Thread GitBox
bvaradar commented on a change in pull request #1888: URL: https://github.com/apache/hudi/pull/1888#discussion_r463504707 ## File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala ## @@ -136,7 +136,7 @@ object AvroConversionHelper { case

[GitHub] [hudi] bvaradar commented on issue #1878: [SUPPORT] Spark Structured Streaming To Hudi Sink Datasource taking much longer

2020-07-31 Thread GitBox
bvaradar commented on issue #1878: URL: https://github.com/apache/hudi/issues/1878#issuecomment-667023700 For a monotonically increasing id, you can use bulk-insert instead of insert for first time loading of files, this would nicely order records by the id and your range-pruning during

[GitHub] [hudi] yihua commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
yihua commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r463442771 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java ## @@ -161,6 +161,17 @@ public static void

[GitHub] [hudi] yihua commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
yihua commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r463441657 ## File path: hudi-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitioner.java ## @@ -0,0 +1,143 @@ +/* + * Licensed to

[GitHub] [hudi] vinothchandar commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r463423301 ## File path: hudi-client/src/test/java/org/apache/hudi/execution/bulkinsert/TestBulkInsertInternalPartitioner.java ## @@ -0,0 +1,143 @@ +/* + *

[GitHub] [hudi] vinothchandar commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox
vinothchandar commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r463422364 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestDataGenerator.java ## @@ -161,6 +161,17 @@ public static void