Well, a user don't really know how many jobs will be scheduled and so their order is not something that should matter. A pig script should really be seen as a graph of operators. Your problem was that a dependency between two operators was implicit. Exec allows to 'flush' the existing graph and make sure it has been realised before executing the rest of the operators below. I would rather try to either make that dependency explicit or if not possible, split the script into two separates files to be more explicit. The exec is also a fix but it will impact how much pig can optimize the global workflow.
Bertrand On Sun, Jul 20, 2014 at 3:43 PM, Rodrigo Ferreira <[email protected]> wrote: > Hi everyone, > > I found the answer here: > http://pig.apache.org/docs/r0.9.1/perf.html#Implicit-Dependencies > > It seems that when you have implicit dependencies you have to use the EXEC > command in order to help Pig execute your jobs in the right order. > > Rodrigo. > > > 2014-07-20 14:40 GMT+02:00 Rodrigo Ferreira <[email protected]>: > > > I have a Pig script that was divided by the Pig framework in two > MapReduce > > jobs. So far so good. > > > > One of these jobs was always failing. When I checked the logs I realized > > that Pig is executing the "2nd" job before the "1st". > > > > Well, I think this is happening because the second part of my script > > doesn't depend explicitly on the first part. But I'd like it to be > executed > > before the other part. Is it possible? > > > > I know Pig tries to optimize several things, but changing the order of > the > > MR jobs is not something nice. Are pigs "domestic animals" are all? > > > > By the way, how much control do we really have over Pig's internal DAG? > > > > Thanks, > > Rodrigo Ferreira. > > >
