[ https://issues.apache.org/jira/browse/CARBONDATA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-2309. ---------------------------------- Resolution: Fixed Fix Version/s: 1.4.1 1.5.0 > Add strategy to generate bigger carbondata files in case of small amount of > data > -------------------------------------------------------------------------------- > > Key: CARBONDATA-2309 > URL: https://issues.apache.org/jira/browse/CARBONDATA-2309 > Project: CarbonData > Issue Type: Improvement > Components: data-load > Reporter: xuchuanyin > Assignee: wangsen > Priority: Major > Fix For: 1.5.0, 1.4.1 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > In some scenario, the input amount of loading data is small, but carbondata > still distribute them to each executors (nodes) to do local-sort, thus > resulting to small carbondata files generated by each executor. > In some extreme conditions, if the cluster is big enough or if the amount of > data is small enough, the carbondata file contains only one blocklet or page. > I think a new strategy should be introduced to solve the above problem. > The new strategy should: > # be able to control the minimum amount of input data for each node > # ignore data locality otherwise it may always choose a small portion of > particular nodes -- This message was sent by Atlassian JIRA (v7.6.3#76005)