Hi All,
I'm currently integrating Pig with HCatalog & then trying to run the pig
scripts. I'm using cloudera CDH 4.4.0 with pig-0.11.0+33, hive-0.10.0+198
and hcatalog-0.5.0+13.
When I use pig -useHCatalog to run my pig scripts, everything works fine.
But when I try to launch the pig scripts usi
The "doc" field should be at the level of the record, not the field. Maybe
that's the issue even though the exception is not clear.
For the first version, you can let Pig generate the schema and then evolve
it.
Bertrand
On Tue, Oct 15, 2013 at 7:29 PM, anup ahire wrote:
> Hello ,
>
> I am try
Thanks. This is what I want.
Best regards,
Ey-Chih
On Tue, Oct 15, 2013 at 1:50 PM, Alan Gates wrote:
> Pig handles doing multiple group bys on the same input, often in a single
> MR job. So:
>
> A = load 'file';
> B = group A by $0;
> C = foreach B generate group, COUNT(A);
> store C into
Pig handles doing multiple group bys on the same input, often in a single MR
job. So:
A = load 'file';
B = group A by $0;
C = foreach B generate group, COUNT(A);
store C into 'output1';
D = group A by $1;
E = foreach D generate group, COUNT(A);
store D into 'output2';
This can be done in a sing
Can you describe what your input data looks like and what you want your
output data to look like?
I don’t understand your question. A group by is really straight forward to
do on a dataset.
A = LOAD 'mydata' using MyStorage();
B = GROUP A BY group_key;
dump B;
Is that what you’re looking for?
What I really want to know is,in Pig, how can I read an input data set only
once and generate multiple instances with distinct keys for each data point
and do a group-by?
Best regards,
Ey-Chih Chow
On Tue, Oct 15, 2013 at 10:16 AM, Pradeep Gollakota wrote:
> I'm not aware of anyway to do that.
Hello ,
I am trying to store data into avro using AvroStorage() with following
schema. I have pig 0.11.
{"type":"record","name":"TUPLE_0","fields":[{"name":"Header","type":["null","string"],"doc":"autogenerated
from Pig Field Schema"}]}
I am getting following errors when I run the job.
Caused b
I'm not aware of anyway to do that. I think you're also missing the spirit
of Pig. Pig is meant to be a data workflow language. Describe a workflow
for your data using PigLatin and Pig will then compile your script to
MapReduce jobs. The number of MapReduce jobs that it generates is the
smallest nu
Thanks everybody. Is there anyway we can programmatically control the
number of M-R jobs that a Pig script will generate, similar to write M-R
jobs in Java?
Best regards,
Ey-Chih Chow
On Tue, Oct 15, 2013 at 6:14 AM, Shahab Yunus wrote:
> And Geert's comment about using external-to-Pig approa
And Geert's comment about using external-to-Pig approach reminds me that,
then you have Netflix's PigLipstick too. Nice visual tool for actual
execution and stores job history as well.
Regards,
Shahab
On Tue, Oct 15, 2013 at 8:51 AM, Geert Van Landeghem wrote:
> You can also use ambrose to moni
Or Lipstick : https://github.com/Netflix/Lipstick
It's Netflix this time instead of Twitter. ;)
http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html
But by simply running the script, the information your are looking for will
be displayed at the end of the job.
Bertrand
O
You can also use ambrose to monitor execution of your pig script at runtime.
Remark: from pig-0.11 on.
It show you the DAG of MR jobs and which are currently being executed. As long
as pig-ambrose is connected to the execution of your script (workflow) you can
replay the workflow.
--
kind reg
Have you tried using ILLUSTRATE and EXPLAIN command? As far as I know, I
don't think they give you the exact number as it depends on the actual data
but I believe you can interpret it/extrapolate it from the information
provided by these commands.
Regards,
Shahab
On Tue, Oct 15, 2013 at 3:57 AM,
Hi,
I have a Pig script that has two group-by statements on the the input data
set. Is there anybody knows how many M-R jobs the script will generate?
Thanks.
Best regards,
Ey-Chih Chow
14 matches
Mail list logo