java.io.IOException: Deserialization error: org.apache.hcatalog.data.schema.HCatSchema

2013-10-15 Thread Prasad GS
Hi All, I'm currently integrating Pig with HCatalog & then trying to run the pig scripts. I'm using cloudera CDH 4.4.0 with pig-0.11.0+33, hive-0.10.0+198 and hcatalog-0.5.0+13. When I use pig -useHCatalog to run my pig scripts, everything works fine. But when I try to launch the pig scripts usi

Re: AvroStorage issue

2013-10-15 Thread Bertrand Dechoux
The "doc" field should be at the level of the record, not the field. Maybe that's the issue even though the exception is not clear. For the first version, you can let Pig generate the schema and then evolve it. Bertrand On Tue, Oct 15, 2013 at 7:29 PM, anup ahire wrote: > Hello , > > I am try

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread ey-chih chow
Thanks. This is what I want. Best regards, Ey-Chih On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates wrote: > Pig handles doing multiple group bys on the same input, often in a single > MR job. So: > > A = load 'file'; > B = group A by $0; > C = foreach B generate group, COUNT(A); > store C into

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Alan Gates
Pig handles doing multiple group bys on the same input, often in a single MR job. So: A = load 'file'; B = group A by $0; C = foreach B generate group, COUNT(A); store C into 'output1'; D = group A by $1; E = foreach D generate group, COUNT(A); store D into 'output2'; This can be done in a sing

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Pradeep Gollakota
Can you describe what your input data looks like and what you want your output data to look like? I don’t understand your question. A group by is really straight forward to do on a dataset. A = LOAD 'mydata' using MyStorage(); B = GROUP A BY group_key; dump B; Is that what you’re looking for?

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread ey-chih chow
What I really want to know is,in Pig, how can I read an input data set only once and generate multiple instances with distinct keys for each data point and do a group-by? Best regards, Ey-Chih Chow On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota wrote: > I'm not aware of anyway to do that.

AvroStorage issue

2013-10-15 Thread anup ahire
Hello , I am trying to store data into avro using AvroStorage() with following schema. I have pig 0.11. {"type":"record","name":"TUPLE_0","fields":[{"name":"Header","type":["null","string"],"doc":"autogenerated from Pig Field Schema"}]} I am getting following errors when I run the job. Caused b

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Pradeep Gollakota
I'm not aware of anyway to do that. I think you're also missing the spirit of Pig. Pig is meant to be a data workflow language. Describe a workflow for your data using PigLatin and Pig will then compile your script to MapReduce jobs. The number of MapReduce jobs that it generates is the smallest nu

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread ey-chih chow
Thanks everybody. Is there anyway we can programmatically control the number of M-R jobs that a Pig script will generate, similar to write M-R jobs in Java? Best regards, Ey-Chih Chow On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus wrote: > And Geert's comment about using external-to-Pig approa

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Shahab Yunus
And Geert's comment about using external-to-Pig approach reminds me that, then you have Netflix's PigLipstick too. Nice visual tool for actual execution and stores job history as well. Regards, Shahab On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem wrote: > You can also use ambrose to moni

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Bertrand Dechoux
Or Lipstick : https://github.com/Netflix/Lipstick It's Netflix this time instead of Twitter. ;) http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html But by simply running the script, the information your are looking for will be displayed at the end of the job. Bertrand O

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Geert Van Landeghem
You can also use ambrose to monitor execution of your pig script at runtime. Remark: from pig-0.11 on. It show you the DAG of MR jobs and which are currently being executed. As long as pig-ambrose is connected to the execution of your script (workflow) you can replay the workflow. -- kind reg

Re: number of M/R jobs for a Pig Script

2013-10-15 Thread Shahab Yunus
Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know, I don't think they give you the exact number as it depends on the actual data but I believe you can interpret it/extrapolate it from the information provided by these commands. Regards, Shahab On Tue, Oct 15, 2013 at 3:57 AM,

number of M/R jobs for a Pig Script

2013-10-15 Thread ey-chih chow
Hi, I have a Pig script that has two group-by statements on the the input data set. Is there anybody knows how many M-R jobs the script will generate? Thanks. Best regards, Ey-Chih Chow