I got another question. It I want to embed this pig script of multiple group bys into a Java program, using PigServer. Will the multiple stores as follows be executed in one MR job?
pigServer.store("C", "output1"); pigServer.store("E", "output2"); If not, how can I achieve this? Thanks. Ey-Chih Chow On Tue, Oct 15, 2013 at 3:40 PM, ey-chih chow <eyc...@gmail.com> wrote: > Thanks. This is what I want. > > Best regards, > > Ey-Chih > > > On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates <ga...@hortonworks.com> wrote: > >> Pig handles doing multiple group bys on the same input, often in a single >> MR job. So: >> >> A = load 'file'; >> B = group A by $0; >> C = foreach B generate group, COUNT(A); >> store C into 'output1'; >> D = group A by $1; >> E = foreach D generate group, COUNT(A); >> store D into 'output2'; >> >> This can be done in a single MR job. Is that what you're looking for? >> >> Alan. >> >> On Oct 15, 2013, at 12:12 PM, ey-chih chow wrote: >> >> > What I really want to know is,in Pig, how can I read an input data set >> only >> > once and generate multiple instances with distinct keys for each data >> point >> > and do a group-by? >> > >> > Best regards, >> > >> > Ey-Chih Chow >> > >> > >> > On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota < >> pradeep...@gmail.com>wrote: >> > >> >> I'm not aware of anyway to do that. I think you're also missing the >> spirit >> >> of Pig. Pig is meant to be a data workflow language. Describe a >> workflow >> >> for your data using PigLatin and Pig will then compile your script to >> >> MapReduce jobs. The number of MapReduce jobs that it generates is the >> >> smallest number of jobs (based on the optimizers) that Pig thinks it >> needs >> >> to complete the workflow. >> >> >> >> Why do you want to control the number of MR jobs? >> >> >> >> >> >> On Tue, Oct 15, 2013 at 10:07 AM, ey-chih chow <eyc...@gmail.com> >> wrote: >> >> >> >>> Thanks everybody. Is there anyway we can programmatically control the >> >>> number of M-R jobs that a Pig script will generate, similar to write >> M-R >> >>> jobs in Java? >> >>> >> >>> Best regards, >> >>> >> >>> Ey-Chih Chow >> >>> >> >>> >> >>> On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus <shahab.yu...@gmail.com >> >>>> wrote: >> >>> >> >>>> And Geert's comment about using external-to-Pig approach reminds me >> >> that, >> >>>> then you have Netflix's PigLipstick too. Nice visual tool for actual >> >>>> execution and stores job history as well. >> >>>> >> >>>> Regards, >> >>>> Shahab >> >>>> >> >>>> >> >>>> On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem < >> >> g...@foundation.be >> >>>>> wrote: >> >>>> >> >>>>> You can also use ambrose to monitor execution of your pig script at >> >>>>> runtime. Remark: from pig-0.11 on. >> >>>>> >> >>>>> It show you the DAG of MR jobs and which are currently being >> >> executed. >> >>> As >> >>>>> long as pig-ambrose is connected to the execution of your script >> >>>> (workflow) >> >>>>> you can replay the workflow. >> >>>>> >> >>>>> -- >> >>>>> kind regards, >> >>>>> Geert >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> On 15-okt.-2013, at 14:43, Shahab Yunus <shahab.yu...@gmail.com> >> >>> wrote: >> >>>>> >> >>>>>> Have you tried using ILLUSTRATE and EXPLAIN command? As far as I >> >>> know, >> >>>> I >> >>>>>> don't think they give you the exact number as it depends on the >> >>> actual >> >>>>> data >> >>>>>> but I believe you can interpret it/extrapolate it from the >> >>> information >> >>>>>> provided by these commands. >> >>>>>> >> >>>>>> Regards, >> >>>>>> Shahab >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Oct 15, 2013 at 3:57 AM, ey-chih chow <eyc...@gmail.com> >> >>>> wrote: >> >>>>>> >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I have a Pig script that has two group-by statements on the the >> >>> input >> >>>>> data >> >>>>>>> set. Is there anybody knows how many M-R jobs the script will >> >>>> generate? >> >>>>>>> Thanks. >> >>>>>>> >> >>>>>>> Best regards, >> >>>>>>> >> >>>>>>> Ey-Chih Chow >> >>>>>>> >> >>>>> >> >>>>> >> >>>> >> >>> >> >> >> >> >> -- >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity >> to >> which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified >> that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender >> immediately >> and delete it from your system. Thank You. >> > >