[ 
https://issues.apache.org/jira/browse/CARBONDATA-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-1617.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.3.0

> Merging carbonindex files for each segment.
> -------------------------------------------
>
>                 Key: CARBONDATA-1617
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1617
>             Project: CarbonData
>          Issue Type: New Feature
>            Reporter: Ravindra Pesala
>            Priority: Major
>             Fix For: 1.3.0
>
>          Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> Hi,
> Problem :
>  The first-time query of carbon becomes very slow. It is because of reading 
> many small carbonindex files and cache to the driver at the first time. 
>  Many carbonindex files are created in below case
>  Loading data in large cluster
>    For example, if the cluster size is 100 nodes then for each load 100 index 
> files are created per segment. So after 100 loads, the number of carbonindex 
> files becomes 10000. .
> It will be slower to read all the files from the driver since a lot of 
> namenode calls and IO operations.
> Solution :
> Merge the carbonindex files in two levels.so that we can reduce the IO calls 
> to namenode and improves the read performance.
> Merge within a segment.
> Merge the carbonindex files to single file immediately after load completes 
> within the segment. It would be named as a .carbonindexmerge file. It is 
> actually not a true data merging but a simple file merge. So that the current 
> structure of carbonindex files does not change. While reading we just read 
> one file instead of many carbonindex files within the segment.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to