Dear Dai,

Thanks for your reply.
What I want to do is to compare the two different order of join. The query
is as following:

*Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales BY cs_item_sk;*
*Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk, cs_order_number),
catalog_returns BY (cr_item_sk, cr_order_number);*
*Dump or Store Bad_OrderRes;*

*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk, cr_order_number),
catalog_sales BY (cs_item_sk, cs_order_number);*
*Good_OrderRes = JOIN Good_OrderIn  BY  cs_item_sk, inventory BY
 inv_item_sk;*
*Dump or Store Good_OrderRes;*

Since Pig execute the query lazily, I think only by Dump or Store the
result, I can know the time of MapReduce Job, is it right? If it is, then I
need to count the time to Dump or Store the result as the time for the
different orders' join.

Bests,
Mingda



On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai <da...@hortonworks.com> wrote:

> Hi, Mingda,
>
> Pig does not do join reordering and will execute the query as the way it
> is written. Note you can join multiple relations in one join statement.
>
> Do you want execution time for each join in your statement? I assume you
> are using regular join and running with MapReduce, every join statement
> will be a separate MapReduce job and the join runtime is the runtime for
> its MapReduce job.
>
> Thanks,
> Daniel
>
>
>
> On 10/31/16, 8:21 PM, "mingda li" <limingda1...@gmail.com> wrote:
>
> >Dear all,
> >
> >I am doing optimization for multiple join. I am not sure if Pig can decide
> >the join order in optimization layer. Does anyone know about this? Or Pig
> >just execute the query as the way it is written.
> >
> >And, I want to do the multiple way Join on different keys. Can the
> >following query work?
> >
> >Res =
> >JOIN
> >(JOIN catalog_sales BY cs_item_sk, inventory BY  inv_item_sk) BY
> >(cs_item_sk, cs_order_number), catalog_returns BY (cr_item_sk,
> >cr_order_number);
> >
> >BTW, each time, I run the query, it is finished in one second. Is there a
> >way to see the execution time? I have set the  pig.udf.profile=true. Where
> >can I find the time?
> >
> >Bests,
> >Mingda
>

Reply via email to