[ https://issues.apache.org/jira/browse/CARBONDATA-1617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-1617. ---------------------------------- Resolution: Fixed Fix Version/s: 1.3.0 > Merging carbonindex files for each segment. > ------------------------------------------- > > Key: CARBONDATA-1617 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1617 > Project: CarbonData > Issue Type: New Feature > Reporter: Ravindra Pesala > Priority: Major > Fix For: 1.3.0 > > Time Spent: 10h 20m > Remaining Estimate: 0h > > Hi, > Problem : > The first-time query of carbon becomes very slow. It is because of reading > many small carbonindex files and cache to the driver at the first time. > Many carbonindex files are created in below case > Loading data in large cluster > For example, if the cluster size is 100 nodes then for each load 100 index > files are created per segment. So after 100 loads, the number of carbonindex > files becomes 10000. . > It will be slower to read all the files from the driver since a lot of > namenode calls and IO operations. > Solution : > Merge the carbonindex files in two levels.so that we can reduce the IO calls > to namenode and improves the read performance. > Merge within a segment. > Merge the carbonindex files to single file immediately after load completes > within the segment. It would be named as a .carbonindexmerge file. It is > actually not a true data merging but a simple file merge. So that the current > structure of carbonindex files does not change. While reading we just read > one file instead of many carbonindex files within the segment. -- This message was sent by Atlassian JIRA (v6.4.14#64029)