[ 
https://issues.apache.org/jira/browse/PIG-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reopened PIG-466:
----------------------------


PIG-922 partially solve this issue by pushing columns to the loader. However, 
we can go beyond that. For example:

{code}
a = load '1.txt' as (a0, a1, a2, a3);
b = filter a by a2==1;
c = order b by a1;
d = foreach c generate a0, a1;
{code}

PIG-922 is able to figure out a3 is not needed in the script and don't load it. 
One step further, we can figure out a2 is no longer needed after b, so we can 
add a foreach and drop a2 after b. This is not covered by PIG-922 and is part 
of new optimizer work.

> PERFORMANCE: dropping the columns as soon as possible
> -----------------------------------------------------
>
>                 Key: PIG-466
>                 URL: https://issues.apache.org/jira/browse/PIG-466
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>            Assignee: Daniel Dai
>             Fix For: 0.6.0
>
>
> Currently, each operator carries all the data until foreach is encountered. 
> This can cause significant performance degradation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to