Dear all, Hi, now I wants to import a UDF function to pig command. Has anyone ever done so? I want to import google's guava/murmur3_32 to pig. Could anyone give some useful materials or suggestion?
Bests, Mingda On Wed, Nov 2, 2016 at 2:11 AM, mingda li <limingda1...@gmail.com> wrote: > Yeah, I see. Thanks for your reply. > > Bests, > Mingda > > On Tue, Nov 1, 2016 at 9:20 PM, Daniel Dai <da...@hortonworks.com> wrote: > >> Yes, you need to dump/store xxx_OrderRes to kick off the job. You will >> see two MapReduce jobs corresponding to the first and second join. >> >> Thanks, >> Daniel >> >> >> >> On 11/1/16, 10:52 AM, "mingda li" <limingda1...@gmail.com> wrote: >> >> >Dear Dai, >> > >> >Thanks for your reply. >> >What I want to do is to compare the two different order of join. The >> query >> >is as following: >> > >> >*Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales BY >> cs_item_sk;* >> >*Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, cs_order_number), >> >catalog_returns BY (cr_item_sk, cr_order_number);* >> >*Dump or Store Bad_OrderRes;* >> > >> >*Good_OrderIn = JOIN catalog_returns BY (cr_item_sk, cr_order_number), >> >catalog_sales BY (cs_item_sk, cs_order_number);* >> >*Good_OrderRes = JOIN Good_OrderIn BY cs_item_sk, inventory BY >> > inv_item_sk;* >> >*Dump or Store Good_OrderRes;* >> > >> >Since Pig execute the query lazily, I think only by Dump or Store the >> >result, I can know the time of MapReduce Job, is it right? If it is, >> then I >> >need to count the time to Dump or Store the result as the time for the >> >different orders' join. >> > >> >Bests, >> >Mingda >> > >> > >> > >> >On Tue, Nov 1, 2016 at 10:39 AM, Daniel Dai <da...@hortonworks.com> >> wrote: >> > >> >> Hi, Mingda, >> >> >> >> Pig does not do join reordering and will execute the query as the way >> it >> >> is written. Note you can join multiple relations in one join statement. >> >> >> >> Do you want execution time for each join in your statement? I assume >> you >> >> are using regular join and running with MapReduce, every join statement >> >> will be a separate MapReduce job and the join runtime is the runtime >> for >> >> its MapReduce job. >> >> >> >> Thanks, >> >> Daniel >> >> >> >> >> >> >> >> On 10/31/16, 8:21 PM, "mingda li" <limingda1...@gmail.com> wrote: >> >> >> >> >Dear all, >> >> > >> >> >I am doing optimization for multiple join. I am not sure if Pig can >> decide >> >> >the join order in optimization layer. Does anyone know about this? Or >> Pig >> >> >just execute the query as the way it is written. >> >> > >> >> >And, I want to do the multiple way Join on different keys. Can the >> >> >following query work? >> >> > >> >> >Res = >> >> >JOIN >> >> >(JOIN catalog_sales BY cs_item_sk, inventory BY inv_item_sk) BY >> >> >(cs_item_sk, cs_order_number), catalog_returns BY (cr_item_sk, >> >> >cr_order_number); >> >> > >> >> >BTW, each time, I run the query, it is finished in one second. Is >> there a >> >> >way to see the execution time? I have set the pig.udf.profile=true. >> Where >> >> >can I find the time? >> >> > >> >> >Bests, >> >> >Mingda >> >> >> > >