Re: About Multiple Join in Pig

mingda li Wed, 02 Nov 2016 02:12:26 -0700

Yeah, I see. Thanks for your reply.

Bests,
Mingda


On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai <[email protected]> wrote:

> Yes, you need to dump/store xxx_OrderRes to kick off the job. You will see
> two MapReduce jobs corresponding to the first and second join.
>
> Thanks,
> Daniel
>
>
>
> On 11/1/16, 10:52 AM, "mingda li" <[email protected]> wrote:
>
> >Dear Dai,
> >
> >Thanks for your reply.
> >What I want to do is to compare the two different order of join. The query
> >is as following:
> >
> >*Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales BY
> cs_item_sk;*
> >*Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk, cs_order_number),
> >catalog_returns BY (cr_item_sk, cr_order_number);*
> >*Dump or Store Bad_OrderRes;*
> >
> >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk, cr_order_number),
> >catalog_sales BY (cs_item_sk, cs_order_number);*
> >*Good_OrderRes = JOIN Good_OrderIn  BY  cs_item_sk, inventory BY
> > inv_item_sk;*
> >*Dump or Store Good_OrderRes;*
> >
> >Since Pig execute the query lazily, I think only by Dump or Store the
> >result, I can know the time of MapReduce Job, is it right? If it is, then
> I
> >need to count the time to Dump or Store the result as the time for the
> >different orders' join.
> >
> >Bests,
> >Mingda
> >
> >
> >
> >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai <[email protected]>
> wrote:
> >
> >> Hi, Mingda,
> >>
> >> Pig does not do join reordering and will execute the query as the way it
> >> is written. Note you can join multiple relations in one join statement.
> >>
> >> Do you want execution time for each join in your statement? I assume you
> >> are using regular join and running with MapReduce, every join statement
> >> will be a separate MapReduce job and the join runtime is the runtime for
> >> its MapReduce job.
> >>
> >> Thanks,
> >> Daniel
> >>
> >>
> >>
> >> On 10/31/16, 8:21 PM, "mingda li" <[email protected]> wrote:
> >>
> >> >Dear all,
> >> >
> >> >I am doing optimization for multiple join. I am not sure if Pig can
> decide
> >> >the join order in optimization layer. Does anyone know about this? Or
> Pig
> >> >just execute the query as the way it is written.
> >> >
> >> >And, I want to do the multiple way Join on different keys. Can the
> >> >following query work?
> >> >
> >> >Res =
> >> >JOIN
> >> >(JOIN catalog_sales BY cs_item_sk, inventory BY  inv_item_sk) BY
> >> >(cs_item_sk, cs_order_number), catalog_returns BY (cr_item_sk,
> >> >cr_order_number);
> >> >
> >> >BTW, each time, I run the query, it is finished in one second. Is
> there a
> >> >way to see the execution time? I have set the  pig.udf.profile=true.
> Where
> >> >can I find the time?
> >> >
> >> >Bests,
> >> >Mingda
> >>
>

Re: About Multiple Join in Pig

Reply via email to