Well, a user don't really know how many jobs will be scheduled and so their
order is not something that should matter. A pig script should really be
seen as a graph of operators. Your problem was that a dependency between
two operators was implicit. Exec allows to 'flush' the existing graph and
make sure it has been realised before executing the rest of the operators
below. I would rather try to either make that dependency explicit or if not
possible, split the script into two separates files to be more explicit.
The exec is also a fix but it will impact how much pig can optimize the
global workflow.

Bertrand


On Sun, Jul 20, 2014 at 3:43 PM, Rodrigo Ferreira <[email protected]> wrote:

> Hi everyone,
>
> I found the answer here:
> http://pig.apache.org/docs/r0.9.1/perf.html#Implicit-Dependencies
>
> It seems that when you have implicit dependencies you have to use the EXEC
> command in order to help Pig execute your jobs in the right order.
>
> Rodrigo.
>
>
> 2014-07-20 14:40 GMT+02:00 Rodrigo Ferreira <[email protected]>:
>
> > I have a Pig script that was divided by the Pig framework in two
> MapReduce
> > jobs. So far so good.
> >
> > One of these jobs was always failing. When I checked the logs I realized
> > that Pig is executing the "2nd" job before the "1st".
> >
> > Well, I think this is happening because the second part of my script
> > doesn't depend explicitly on the first part. But I'd like it to be
> executed
> > before the other part. Is it possible?
> >
> > I know Pig tries to optimize several things, but changing the order of
> the
> > MR jobs is not something nice. Are pigs "domestic animals" are all?
> >
> > By the way, how much control do we really have over Pig's internal DAG?
> >
> > Thanks,
> > Rodrigo Ferreira.
> >
>

Reply via email to