support expression of indirect dependency between statements 
-------------------------------------------------------------

                 Key: PIG-2212
                 URL: https://issues.apache.org/jira/browse/PIG-2212
             Project: Pig
          Issue Type: New Feature
            Reporter: Thejas M Nair


There needs to be a better way to support following use case, than using exec. 
It should be possible to express a dependency between a store statement and 
another statement, to ensure that the store happens first. It is worth 
considering if allowing user to specify dependency between any two statements 
is going to be useful.

{noformat} 
 I have some data that I would like to store into a file and then load it in a 
UDF to do some operations in the next pig statement. 
For example,
doc_ids = FOREACH docs GENERATE doc_id;
STORE doc_ids INTO '$TEMP';
modifieddocs = FOREACH docs GENERATE myUDF('$TEMP', doc_id);

where myUDF loads doc_ids stored in '$TEMP' and does some operation using $TEMP 
and doc_id. 
Now I need to make sure that the "STORE doc_ids INTO '$TEMP';" occurs before 
the FOREACH statement, 
so that loading the index occurs smoothly. Is there anyway to guarantee that 
that can happen?
{noformat} 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to