[ https://issues.apache.org/jira/browse/CARBONDATA-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhatchayani updated CARBONDATA-3293: ------------------------------------ Summary: Prune datamaps improvement for count(*) (was: Prune datamaps improvement) > Prune datamaps improvement for count(*) > --------------------------------------- > > Key: CARBONDATA-3293 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3293 > Project: CarbonData > Issue Type: Improvement > Reporter: dhatchayani > Assignee: dhatchayani > Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > +*Problem:*+ > (1) Currently for count ( *) , the prune is same as select * query. Blocklet > and ExtendedBlocklet are formed from the DataMapRow and that is of no need > and it is a time consuming process. > (2) Pruning in select * query consumes time in convertToSafeRow() - > converting the DataMapRow to safe as in an unsafe row to get the position of > data, we need to traverse through the whole row to reach a position. > (3) In case of filter queries, even if the blocklet is valid or invalid, we > are converting the DataMapRow to safeRow. This conversion is time consuming > increasing the number of blocklets. > > +*Solution:*+ > (1) We have the blocklet row count in the DataMapRow itself, so it is just > enough to read the count. With this count ( *) query performance can be > improved. > (2) Maintain the data length also to the DataMapRow, so that traversing the > whole row can be avoided. With the length we can directly hit the data > position. > (3) Read only the MinMax from the DataMapRow, decide whether scan is required > on that blocklet, if required only then it can be converted to safeRow, if > needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)