Rohini Palaniswamy created PIG-4536:
---------------------------------------

             Summary: LIMIT inside nested foreach should have combiner 
optimization
                 Key: PIG-4536
                 URL: https://issues.apache.org/jira/browse/PIG-4536
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy


data_group = GROUP A BY (f1, f2) PARALLEL 100;
group_result = FOREACH data_group {
B = LIMIT A.f3 1;

GENERATE group,  
SUM(A.f3),
SUM(A.f4),
SUM(A.f5),
SUM(A.f6),
FLATTEN(B);
};

A script like this has combiner optimization turned off and so consumes a lot 
of memory and is slow. We should implement LIMIT using Combiner in cases like 
this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to