[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444658761 ## File path: hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -146,21 +146,22 @@ private void syncSchema(String tableName, bo

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444657966 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java ## @@ -80,58 +77,6 @@ protected Hoodi

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444656608 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java ## @@ -16,7 +16,7 @@ * limitations under the License.

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444655765 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieFileWriter.java ## @@ -33,4 +37,12 @@ void close() throws IOException;

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444654900 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieWriteHandle.java ## @@ -180,4 +183,10 @@ protected int getStageId() { protected lon

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444654644 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieReadHandle.java ## @@ -56,4 +61,9 @@ protected HoodieBaseFile getLatestDataFile() {

[GitHub] [hudi] zuyanton opened a new issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-23 Thread GitBox
zuyanton opened a new issue #1764: URL: https://github.com/apache/hudi/issues/1764 **Describe the problem you faced** We are running MoR table on EMR+Hudi+S3 with ```hoodie.consistency.check.enabled```set to true with compaction set to be executed inline.We update table every ten mi

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-23 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r444623018 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DataSourceOptions.scala ## @@ -48,7 +49,7 @@ object DataSourceReadOptions { val QUERY_TYPE_SNAP

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #318

2020-06-23 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.41 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties /home/jenkins/tool

[GitHub] [hudi] venkee14 opened a new issue #1763: Hudi total upsert time is twice than the individual jobs time in Spark UI added together

2020-06-23 Thread GitBox
venkee14 opened a new issue #1763: URL: https://github.com/apache/hudi/issues/1763 I have noticed that the individual jobs runtime in Spark UI server does not add up to the total upsert time taken. I am trying to understand where the extra time is spent and reduce it and make the upsert ru

[GitHub] [hudi] nsivabalan opened a new pull request #1762: [WIP] Bulk insert Dataset

2020-06-23 Thread GitBox
nsivabalan opened a new pull request #1762: URL: https://github.com/apache/hudi/pull/1762 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of the

[GitHub] [hudi] lw309637554 commented on pull request #1756: [HUDI-839] Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-23 Thread GitBox
lw309637554 commented on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-648541535 > Took a quick pass at the three test classes you have added.. LGTM . > Will do a detailed pass once you confirm PR is indeed ready.. Thanks,i will fix the failed unit

[jira] [Updated] (HUDI-1042) [Umbrella] Support clustering on filegroups

2020-06-23 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1042: Summary: [Umbrella] Support clustering on filegroups (was: Support clustering on filegroups) > [Umbrella] Support

[jira] [Updated] (HUDI-1044) Support Clustering in MoR mode

2020-06-23 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1044: Description: updates are not allowed during clustering > Support Clustering in MoR mode > --

[jira] [Created] (HUDI-1048) Support synchronize clustering in MoR mode

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1048: --- Summary: Support synchronize clustering in MoR mode Key: HUDI-1048 URL: https://issues.apache.org/jira/browse/HUDI-1048 Project: Apache Hudi Issue Type: Sub-task R

[jira] [Created] (HUDI-1047) Support synchronize clustering in CoW mode

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1047: --- Summary: Support synchronize clustering in CoW mode Key: HUDI-1047 URL: https://issues.apache.org/jira/browse/HUDI-1047 Project: Apache Hudi Issue Type: Sub-task R

[jira] [Created] (HUDI-1046) Support updates during clustering in CoW mode

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1046: --- Summary: Support updates during clustering in CoW mode Key: HUDI-1046 URL: https://issues.apache.org/jira/browse/HUDI-1046 Project: Apache Hudi Issue Type: Sub-task

[jira] [Created] (HUDI-1045) Support updates during clustering in MoR mode

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1045: --- Summary: Support updates during clustering in MoR mode Key: HUDI-1045 URL: https://issues.apache.org/jira/browse/HUDI-1045 Project: Apache Hudi Issue Type: Sub-task

[jira] [Updated] (HUDI-1043) Support clustering in CoW mode

2020-06-23 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1043: Description: updates are not allowed during clustering > Support clustering in CoW mode > --

[jira] [Assigned] (HUDI-1044) Support Clustering in MoR mode

2020-06-23 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-1044: --- Assignee: leesf > Support Clustering in MoR mode > -- > > Key: HU

[jira] [Resolved] (HUDI-1039) Cleanup redundant timeline service when createHoodleClient

2020-06-23 Thread renyi.bao (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] renyi.bao resolved HUDI-1039. - Resolution: Fixed this issue found in 0.5.3 and has auto fixed by another implement for HoodieWriteClient

[jira] [Created] (HUDI-1044) Support Clustering in MoR mode

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1044: --- Summary: Support Clustering in MoR mode Key: HUDI-1044 URL: https://issues.apache.org/jira/browse/HUDI-1044 Project: Apache Hudi Issue Type: Sub-task Reporter: lee

[jira] [Created] (HUDI-1043) Support clustering in CoW mode

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1043: --- Summary: Support clustering in CoW mode Key: HUDI-1043 URL: https://issues.apache.org/jira/browse/HUDI-1043 Project: Apache Hudi Issue Type: Sub-task Reporter: lee

[jira] [Created] (HUDI-1042) Support clustering on filegroups

2020-06-23 Thread leesf (Jira)
leesf created HUDI-1042: --- Summary: Support clustering on filegroups Key: HUDI-1042 URL: https://issues.apache.org/jira/browse/HUDI-1042 Project: Apache Hudi Issue Type: Bug Reporter: leesf

[jira] [Updated] (HUDI-1039) Cleanup redundant timeline service when createHoodleClient

2020-06-23 Thread renyi.bao (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] renyi.bao updated HUDI-1039: Status: Open (was: New) > Cleanup redundant timeline service when createHoodleClient >

[jira] [Commented] (HUDI-945) Cleanup spillable map files eagerly as part of close

2020-06-23 Thread renyi.bao (Jira)
[ https://issues.apache.org/jira/browse/HUDI-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143445#comment-17143445 ] renyi.bao commented on HUDI-945: hi [~vbalaji],I noticed that the current implementation is

[GitHub] [hudi] EdwinGuo commented on pull request #1721: [WIP] [HUDI-1041] Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-23 Thread GitBox
EdwinGuo commented on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-648532688 > @EdwinGuo @nsivabalan let's hash this out.. its an interesting one.. Although it may seem like we are computing the fully exploded RDD in both places.. if you look closely, we d

[jira] [Updated] (HUDI-1041) Cache the explodeRecordRDDWithFileComparisons

2020-06-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1041: - Labels: pull-request-available (was: ) > Cache the explodeRecordRDDWithFileComparisons > ---

[GitHub] [hudi] EdwinGuo commented on pull request #1721: [WIP] [HUDI-1041] Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-23 Thread GitBox
EdwinGuo commented on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-648529008 > Can you please include the jira number in the pr title Done. https://issues.apache.org/jira/browse/HUDI-1041 -

[jira] [Created] (HUDI-1041) Cache the explodeRecordRDDWithFileComparisons

2020-06-23 Thread edwinguo (Jira)
edwinguo created HUDI-1041: -- Summary: Cache the explodeRecordRDDWithFileComparisons Key: HUDI-1041 URL: https://issues.apache.org/jira/browse/HUDI-1041 Project: Apache Hudi Issue Type: Improvement

[jira] [Resolved] (HUDI-1035) Remove unused class KeyLookupResult

2020-06-23 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu resolved HUDI-1035. --- Fix Version/s: 0.6.0 Resolution: Fixed > Remove unused class KeyLookupResult >

[jira] [Updated] (HUDI-1035) Remove unused class KeyLookupResult

2020-06-23 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1035: -- Status: Open (was: New) > Remove unused class KeyLookupResult > --- > >

[GitHub] [hudi] davidsheard closed issue #1759: Hive Table Not showing

2020-06-23 Thread GitBox
davidsheard closed issue #1759: URL: https://github.com/apache/hudi/issues/1759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] davidsheard commented on issue #1759: Hive Table Not showing

2020-06-23 Thread GitBox
davidsheard commented on issue #1759: URL: https://github.com/apache/hudi/issues/1759#issuecomment-648514644 We had issues with the jdbc and kerberos to play nicely with the sync tool. We wrapped the tool into our own Java class and added the appropriate configs. Working like we bou

[GitHub] [hudi] vinothchandar commented on issue #1759: Hive Table Not showing

2020-06-23 Thread GitBox
vinothchandar commented on issue #1759: URL: https://github.com/apache/hudi/issues/1759#issuecomment-648512548 @davidsheard sorry to hear that. do you get an error on the spark driver? What the hive version on CDH? We sometimes get CDH issues with Hive 1.x ..

[GitHub] [hudi] vinothchandar commented on issue #1758: [SUPPORT] building InMemoryFileIndex slow with increase target table partitions

2020-06-23 Thread GitBox
vinothchandar commented on issue #1758: URL: https://github.com/apache/hudi/issues/1758#issuecomment-648509194 0.5.3 has the fix already IIUC.. So please let me know how things go on master.. This is an automated message fr

[GitHub] [hudi] vinothchandar commented on a change in pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-06-23 Thread GitBox
vinothchandar commented on a change in pull request #1760: URL: https://github.com/apache/hudi/pull/1760#discussion_r444578021 ## File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala ## @@ -78,4 +79,21 @@ object AvroConversionUtils { def convertAvro

[GitHub] [hudi] vinothchandar commented on a change in pull request #1756: [HUDI-839] Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-23 Thread GitBox
vinothchandar commented on a change in pull request #1756: URL: https://github.com/apache/hudi/pull/1756#discussion_r444577094 ## File path: hudi-client/src/test/java/org/apache/hudi/table/action/rollback/TestCopyOnWriteRollbackActionExecutor.java ## @@ -0,0 +1,246 @@ +/* + *

[GitHub] [hudi] vinothchandar commented on pull request #1756: [HUDI-839] Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-23 Thread GitBox
vinothchandar commented on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-648490993 @lw309637554 is this ready for review? seems like we have unit test failures? This is an automated message

[GitHub] [hudi] vinothchandar commented on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-23 Thread GitBox
vinothchandar commented on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-648489647 @ramachandranms can you please take a quick pass :).. @xushiyan we can time out tomorrow and go ahead merge.. ---

[GitHub] [hudi] vinothchandar merged pull request #1754: [HUDI-1035]Remove unused class KeyLookupResult

2020-06-23 Thread GitBox
vinothchandar merged pull request #1754: URL: https://github.com/apache/hudi/pull/1754 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[hudi] branch master updated (89e37d5 -> 5e47673)

2020-06-23 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 89e37d5 [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690) add 5e47673 [

[GitHub] [hudi] vinothchandar commented on pull request #1702: Bootstrap datasource changes

2020-06-23 Thread GitBox
vinothchandar commented on pull request #1702: URL: https://github.com/apache/hudi/pull/1702#issuecomment-648488394 @umehrot2 does this PR some of @bvaradar 's changes included? This is an automated message from the Apache G

[GitHub] [hudi] vinothchandar commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-23 Thread GitBox
vinothchandar commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r444220031 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -57,8 +57,7 @@ class DefaultSource extends RelationProvider if (

[GitHub] [hudi] bschell commented on a change in pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-06-23 Thread GitBox
bschell commented on a change in pull request #1760: URL: https://github.com/apache/hudi/pull/1760#discussion_r444566173 ## File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala ## @@ -78,4 +79,21 @@ object AvroConversionUtils { def convertAvroSchema

[GitHub] [hudi] vinothchandar commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-06-23 Thread GitBox
vinothchandar commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-648482064 Thanks @bschell .. This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] vinothchandar commented on a change in pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-06-23 Thread GitBox
vinothchandar commented on a change in pull request #1760: URL: https://github.com/apache/hudi/pull/1760#discussion_r444564497 ## File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala ## @@ -78,4 +79,21 @@ object AvroConversionUtils { def convertAvro

[GitHub] [hudi] bschell commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-06-23 Thread GitBox
bschell commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-648481221 I did not run into this when testing #1760 myself, I think it might be because we have internal changes for hive3. I just checked and it looks like we have calcite added and shade

[GitHub] [hudi] vinothchandar commented on a change in pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

2020-06-23 Thread GitBox
vinothchandar commented on a change in pull request #1760: URL: https://github.com/apache/hudi/pull/1760#discussion_r444563924 ## File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionUtils.scala ## @@ -78,4 +79,21 @@ object AvroConversionUtils { def convertAvro

[GitHub] [hudi] afeldman1 opened a new pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-23 Thread GitBox
afeldman1 opened a new pull request #1761: URL: https://github.com/apache/hudi/pull/1761 …ot partitioning tables ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*

[GitHub] [hudi] vinothchandar edited a comment on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
vinothchandar edited a comment on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648478687 > " hoodie.[insert|upsert|bulkinsert].shuffle.parallelism such that its atleast input_data_size/500MB The reason for this was the 2GB limitation in Spark shuffle.. I

[GitHub] [hudi] vinothchandar commented on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
vinothchandar commented on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648478687 > " hoodie.[insert|upsert|bulkinsert].shuffle.parallelism such that its atleast input_data_size/500MB The reason for this was the 2GB limitation in Spark shuffle.. I see you are

[GitHub] [hudi] vinothchandar commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-06-23 Thread GitBox
vinothchandar commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-648470434 @lyogev could you try including it explcitly in `hudi-spark-bundle` pom under `packaging` and give it a shot? (if not, i will make sometime to try this later tonight or tomorrow m

[GitHub] [hudi] lyogev commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-06-23 Thread GitBox
lyogev commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-648460431 Looks like it is! now I'm stuck at hive sync: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/calcite/rel/type/RelDataTypeSystem at org.apache.had

[GitHub] [hudi] somebol edited a comment on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
somebol edited a comment on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648435649 stages 4 & 6 seem to have the most skew. ***screenshots of stage details*** **stage 6** ![image](https://user-images.githubusercontent.com/29965228/85464858-77331580-

[GitHub] [hudi] somebol commented on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
somebol commented on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648435649 stages 4 & 6 seem to have the most skew. **screenshots of stage details** *stage 6 ![image](https://user-images.githubusercontent.com/29965228/85464858-77331580-b5eb-11ea-85

[GitHub] [hudi] bvaradar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
bvaradar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r59286 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieParquetWriter.java ## @@ -51,7 +49,6 @@ private final long maxFileSize; priv

[GitHub] [hudi] somebol commented on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
somebol commented on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648432948 @vinothchandar yes, this is an initial load and we plan to use upsert for incrementals. The task failures are mainly due to preemption. would there be any benefit say increasing bulki

[GitHub] [hudi] vinothchandar commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-06-23 Thread GitBox
vinothchandar commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-648397610 Does #1760 help? If you could try that out & report back.. it'd be awesome.. This is an automated message from

[GitHub] [hudi] vinothchandar commented on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
vinothchandar commented on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648396783 @somebol assuming this is an initial load and after this, you would do insert/upsert operations incrementally? High level, `bulk_insert` does a sort and writes out the data

[GitHub] [hudi] somebol commented on issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-23 Thread GitBox
somebol commented on issue #1757: URL: https://github.com/apache/hudi/issues/1757#issuecomment-648381931 @vinothchandar are you able suggest some tweaks? This is an automated message from the Apache Git Service. To respond to

[jira] [Updated] (HUDI-1040) Support Spark3 compatibility

2020-06-23 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1040: - Labels: pull-request-available (was: ) > Support Spark3 compatibility > -

[GitHub] [hudi] bschell opened a new pull request #1760: [HUDI-1040] Update apis for spark3

2020-06-23 Thread GitBox
bschell opened a new pull request #1760: URL: https://github.com/apache/hudi/pull/1760 Modifies use of spark apis for compatibility with both spark2 and spark3 ## What is the purpose of the pull request Updates spark apis and allows compatibility with spark3 ## Verify th

[jira] [Created] (HUDI-1040) Support Spark3 compatibility

2020-06-23 Thread Brandon Scheller (Jira)
Brandon Scheller created HUDI-1040: -- Summary: Support Spark3 compatibility Key: HUDI-1040 URL: https://issues.apache.org/jira/browse/HUDI-1040 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] vinothchandar commented on pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-23 Thread GitBox
vinothchandar commented on pull request #1687: URL: https://github.com/apache/hudi/pull/1687#issuecomment-648329807 thanks ! @bvaradar if we can get this landed soon and then work on top, that'd be awesome. This is an automa

[GitHub] [hudi] wangxianghu commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-23 Thread GitBox
wangxianghu commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r444315869 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java ## @@ -18,53 +18,55 @@ package org.apache.hud

[GitHub] [hudi] wangxianghu commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-23 Thread GitBox
wangxianghu commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r444315593 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/client/HoodieSparkWriteClient.java ## @@ -481,6 +585,11 @@ public void close()

[jira] [Updated] (HUDI-1039) Cleanup redundant timeline service when createHoodleClient

2020-06-23 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1039: -- Fix Version/s: (was: 0.5.3) 0.6.0 > Cleanup redundant timeline service when creat

[jira] [Commented] (HUDI-1035) Remove unused class KeyLookupResult

2020-06-23 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142898#comment-17142898 ] wangxianghu commented on HUDI-1035: --- [~vinoth] I have checked throughout the hudi-spark

[jira] [Issue Comment Deleted] (HUDI-1035) Remove unused class KeyLookupResult

2020-06-23 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1035: -- Comment: was deleted (was: [~vinoth] I have checked throughout the hudi-spark module, no reflection use

[jira] [Updated] (HUDI-1039) Cleanup redundant timeline service when createHoodleClient

2020-06-23 Thread renyi.bao (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] renyi.bao updated HUDI-1039: Fix Version/s: (was: 0.6.0) 0.5.3 > Cleanup redundant timeline service when createHoo

[jira] [Created] (HUDI-1039) Cleanup redundant timeline service when createHoodleClient

2020-06-23 Thread renyi.bao (Jira)
renyi.bao created HUDI-1039: --- Summary: Cleanup redundant timeline service when createHoodleClient Key: HUDI-1039 URL: https://issues.apache.org/jira/browse/HUDI-1039 Project: Apache Hudi Issue Type

[GitHub] [hudi] davidsheard opened a new issue #1759: Hive Table Not showing

2020-06-23 Thread GitBox
davidsheard opened a new issue #1759: URL: https://github.com/apache/hudi/issues/1759 Hi, We can't seem to get our Hudi Table to show in Hive on Cloudera. We have dropped the Hudi jar into Hive Auxiliary JARs Directory and restarted Hive. But no luck. We are hoping to Demo the merit

[GitHub] [hudi] christoph-wmt commented on issue #1758: [SUPPORT] building InMemoryFileIndex slow with increase target table partitions

2020-06-23 Thread GitBox
christoph-wmt commented on issue #1758: URL: https://github.com/apache/hudi/issues/1758#issuecomment-648045079 sorry, i realize this might be a duplicate of https://github.com/apache/hudi/issues/1552 I'll build off master and give it a shot.

[GitHub] [hudi] christoph-wmt opened a new issue #1758: [SUPPORT]

2020-06-23 Thread GitBox
christoph-wmt opened a new issue #1758: URL: https://github.com/apache/hudi/issues/1758 **Describe the problem you faced** We are using Spark to write Hudi tables to ADLSv2 and GCS. For Append tables, the more partitions are added the more time is taken to complete batches. Actual

[GitHub] [hudi] AndrewKL commented on issue #588: Has anyone used hudi with AWS EMR and EMRFS on s3?

2020-06-23 Thread GitBox
AndrewKL commented on issue #588: URL: https://github.com/apache/hudi/issues/588#issuecomment-647950765 FYI most of these issues are historical from before Hudi was on EMR. This is an automated message from the Apache Git Ser