[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-24 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043773#comment-17043773 ] lamber-ken commented on HUDI-625: - Thanks, if you don't mind, I think I'd like drive it :D

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-24 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043755#comment-17043755 ] Vinoth Chandar commented on HUDI-625: - > if we modify / add filed, we will rework these

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-23 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043117#comment-17043117 ] lamber-ken commented on HUDI-625: - The key issue is "super.getInstantiatorStrategy().newIn

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-23 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043115#comment-17043115 ] Vinoth Chandar commented on HUDI-625: - I fixed that in my PR as well .. Do you want to

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-23 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043098#comment-17043098 ] lamber-ken commented on HUDI-625: - hi [~vinoth], I send some messages to you use slack, may

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-23 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043040#comment-17043040 ] Vinoth Chandar commented on HUDI-625: - https://github.com/apache/incubator-hudi/pull/13

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042748#comment-17042748 ] lamber-ken commented on HUDI-625: - Let me check it, I took GenericRecord as a demo test her

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042721#comment-17042721 ] Vinoth Chandar commented on HUDI-625: - [~lamber-ken] the payload class does not have a

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042603#comment-17042603 ] Vinoth Chandar commented on HUDI-625: - cc [~ovjforu] Would you like to chime in on this

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042597#comment-17042597 ] lamber-ken commented on HUDI-625: - hi [~vinoth], caching the class seems like a good way, b

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042592#comment-17042592 ] Vinoth Chandar commented on HUDI-625: - [~lamber-ken] caching the class looks like a goo

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042484#comment-17042484 ] lamber-ken commented on HUDI-625: - BTW, if we didn't use , it will throw KryoException   {

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-22 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042483#comment-17042483 ] lamber-ken commented on HUDI-625: - It works, I defined a GenericDataRecordSerializer, then

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042356#comment-17042356 ] lamber-ken commented on HUDI-625: - Hi, [~vinoth] I cached the Class info, 100x more   Bef

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042346#comment-17042346 ] Vinoth Chandar commented on HUDI-625: - Okay I was able pack 10x more entries into memor

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042337#comment-17042337 ] lamber-ken commented on HUDI-625: - But, when test following code snippet like hudi init the

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042324#comment-17042324 ] lamber-ken commented on HUDI-625: - I test kryo api, seems blazing fast   dese times: 1000

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042044#comment-17042044 ] Vinoth Chandar commented on HUDI-625: - let's do atleast a million records for these, ki

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042035#comment-17042035 ] sivabalan narayanan commented on HUDI-625: -- oh, I see you tested string, string an

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042030#comment-17042030 ] sivabalan narayanan commented on HUDI-625: -- I am getting something similar to lamb

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041809#comment-17041809 ] lamber-ken commented on HUDI-625: - Test write HoodieRecord value into diskmap, slow {code:j

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041691#comment-17041691 ] Vinoth Chandar commented on HUDI-625: - Wondering if its sufficient to just have the pay

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041656#comment-17041656 ] Vinoth Chandar commented on HUDI-625: - [~lamber-ken] may be we are missing registering

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041654#comment-17041654 ] Vinoth Chandar commented on HUDI-625: - If I write a simple string key, value into diskm

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-21 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041649#comment-17041649 ] Vinoth Chandar commented on HUDI-625: - cc [~nishith29] [~vbalaji] this seems very unexp

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041647#comment-17041647 ] Vinoth Chandar commented on HUDI-625: - Following is the test code used to profile these

[jira] [Commented] (HUDI-625) Address performance concerns on DiskBasedMap.get() during upsert of thin records

2020-02-20 Thread lamber-ken (Jira)
[ https://issues.apache.org/jira/browse/HUDI-625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041644#comment-17041644 ] lamber-ken commented on HUDI-625: - Thinking several solutions can try: :) * Use Spliterato