Yeah, I see. Thanks for your reply. Bests, Mingda
On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai <da...@hortonworks.com> wrote: > Yes, you need to dump/store xxx_OrderRes to kick off the job. You will see > two MapReduce jobs corresponding to the first and second join. > > Thanks, > Daniel > > > > On 11/1/16, 10:52 AM, "mingda li" <limingda1...@gmail.com> wrote: > > >Dear Dai, > > > >Thanks for your reply. > >What I want to do is to compare the two different order of join. The query > >is as following: > > > >*Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales BY > cs_item_sk;* > >*Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, cs_order_number), > >catalog_returns BY (cr_item_sk, cr_order_number);* > >*Dump or Store Bad_OrderRes;* > > > >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk, cr_order_number), > >catalog_sales BY (cs_item_sk, cs_order_number);* > >*Good_OrderRes = JOIN Good_OrderIn BY cs_item_sk, inventory BY > > inv_item_sk;* > >*Dump or Store Good_OrderRes;* > > > >Since Pig execute the query lazily, I think only by Dump or Store the > >result, I can know the time of MapReduce Job, is it right? If it is, then > I > >need to count the time to Dump or Store the result as the time for the > >different orders' join. > > > >Bests, > >Mingda > > > > > > > >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai <da...@hortonworks.com> > wrote: > > > >> Hi, Mingda, > >> > >> Pig does not do join reordering and will execute the query as the way it > >> is written. Note you can join multiple relations in one join statement. > >> > >> Do you want execution time for each join in your statement? I assume you > >> are using regular join and running with MapReduce, every join statement > >> will be a separate MapReduce job and the join runtime is the runtime for > >> its MapReduce job. > >> > >> Thanks, > >> Daniel > >> > >> > >> > >> On 10/31/16, 8:21 PM, "mingda li" <limingda1...@gmail.com> wrote: > >> > >> >Dear all, > >> > > >> >I am doing optimization for multiple join. I am not sure if Pig can > decide > >> >the join order in optimization layer. Does anyone know about this? Or > Pig > >> >just execute the query as the way it is written. > >> > > >> >And, I want to do the multiple way Join on different keys. Can the > >> >following query work? > >> > > >> >Res = > >> >JOIN > >> >(JOIN catalog_sales BY cs_item_sk, inventory BY inv_item_sk) BY > >> >(cs_item_sk, cs_order_number), catalog_returns BY (cr_item_sk, > >> >cr_order_number); > >> > > >> >BTW, each time, I run the query, it is finished in one second. Is > there a > >> >way to see the execution time? I have set the pig.udf.profile=true. > Where > >> >can I find the time? > >> > > >> >Bests, > >> >Mingda > >> >