[ 
https://issues.apache.org/jira/browse/CARBONDATA-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2309.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.4.1
                   1.5.0

> Add strategy to generate bigger carbondata files in case of small amount of 
> data
> --------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2309
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2309
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-load
>            Reporter: xuchuanyin
>            Assignee: wangsen
>            Priority: Major
>             Fix For: 1.5.0, 1.4.1
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> In some scenario, the input amount of loading data is small, but carbondata 
> still distribute them to each executors (nodes) to do local-sort, thus 
> resulting to small carbondata files generated by each executor. 
> In  some extreme conditions, if the cluster is big enough or if the amount of 
> data is small enough, the carbondata file contains only one blocklet or page.
> I  think a new strategy should be introduced to solve the above problem.
> The new strategy should:
>  # be able to control the minimum amount of input data for each node
>  # ignore data locality otherwise it may always choose a small portion of 
> particular nodes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to