Re: [Discussion] Blocklet DataMap caching in driver

2018-06-24 Thread manish gupta
Hi Dev, I have worked on the design document. Please find below the link for design document and share your feedback. https://drive.google.com/open?id=1lN06Pj5tBiBIPSxOBIjK9bpbFVhlUoQA I have also raised the jira issue and uploaded the design document. Please find below the jira link. https://i

Re: [Discussion] Blocklet DataMap caching in driver

2018-06-23 Thread manish gupta
Thanks for the feedback Jacky. As of now we have min/max at each block and blocklet level and while loading the metadata cache we compute the task level min/max. Segment Level min/max is not considered as of now but surely this solution can be enhanced to consider segment level min/max. We can di

Re: [Discussion] Blocklet DataMap caching in driver

2018-06-22 Thread Jacky Li
Hi Manish, +1 for solution 1 for next carbon version. Solution 2 should be also considered, but for a future version after next version. In my previous observation, many scenario user will filter on time range, and since Carbon’s segment is per incremental load which makes it related to time n

Re: [Discussion] Blocklet DataMap caching in driver

2018-06-22 Thread kanaka kumar avvaru
Hi Manish, Thanks for proposing configured columns for min max cache. This will help customers who has large data but only few columns are used for filter condition. +1 for the solution 1. Regards, Kanaka On Fri, Jun 22, 2018 at 11:39 AM, manishgupta88 wrote: > Thanks Ravi for the feedback. I

Re: [Discussion] Blocklet DataMap caching in driver

2018-06-21 Thread manishgupta88
Thanks Ravi for the feedback. I completely agree with you that we need to develop the second solution ASAP. Please find my response below for your queries. 1. what if the query comes on noncached columns, will it start read from disk in driver side for minmax ? - If query is on a non-cached colu

Re: [Discussion] Blocklet DataMap caching in driver

2018-06-21 Thread Ravindra Pesala
Hi Manish, Thanks for proposing the solutions of driver memory problem. +1 for solution 1 but it may not be the complete solution. We should also have solution 2 to solve driver memory issue completely. I think in a very near feature we should have solution 2 as well. I have few doubts and sugge

[Discussion] Blocklet DataMap caching in driver

2018-06-21 Thread manish gupta
Hi Dev, The current implementation of Blocklet dataMap caching in driver is that it caches the min and max values of all the columns in schema by default. The problem with this implementation is that as the number of loads increases the memory required to hold min and max values also increases co