You can use the exec command to force pig run the portion of script before the exec, before it runs rest of it.

it will look like -
STORE doc_ids INTO '$TEMP';
exec;
modifieddocs = FOREACH docs GENERATE myUDF('$TEMP', doc_id);

This is not ideal, as there might be other parts of the script that could have been run in parallel. It would be useful to enhance the syntax to support this kind of implicit dependency. I have created a jira to track this - https://issues.apache.org/jira/browse/PIG-2212 . (Feel free to contribute patches ! :) )


The documentation of exec (http://pig.apache.org/docs/r0.9.0/cmds.html#exec) needs to be fixed, i have created a jira for that - https://issues.apache.org/jira/browse/PIG-2211



Thanks,
Thejas


On 8/10/11 9:10 AM, Eshwaran Vijaya Kumar wrote:
All,
   I have some data that I would like to store into a file and then load it in 
a UDF to do some operations in the next pig statement.
For example,
doc_ids = FOREACH docs GENERATE doc_id;
STORE doc_ids INTO '$TEMP';
modifieddocs = FOREACH docs GENERATE myUDF('$TEMP', doc_id);

where myUDF loads doc_ids stored in '$TEMP' and does some operation using $TEMP and 
doc_id. Now I need to make sure that the "STORE doc_ids INTO '$TEMP';" occurs 
before the FOREACH statement, so that loading the index occurs smoothly. Is there anyway 
to guarantee that that can happen?

Thanks
Esh

Reply via email to