Re: Join question

jamal sasha Mon, 01 Apr 2013 15:45:21 -0700

Hi,
  Yeah, there was a bug in my "stats" data.
I was wondering how can I calcualte average in pig..
Something like :
http://stackoverflow.com/questions/12593527/finding-mean-using-pig-or-hadoop


But in top response.. it seems that the user wanted to calculate across
average across all data..
as

count = COUNT(inpt)
and inpt is the complete input
whereas what i want.. that denominator is count for each id..

so my data is like:

id, value
1,1.0
1,3.0
1,5.0
2,1.0

So, the average I am expecting is:

 1, 3.0
2,1.0

as 1 +3 + 5 /3 = 3
whereas in the example.. count(inpt) should give me 4?

How do i achieve this.
Thanks








On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu <mehmets...@yahoo.com>
wrote:
>
> Are your ids unique?
>
> On 4/1/13 2:06 PM, "jamal sasha" <jamalsha...@gmail.com> wrote:
>
> >Hi,
> >  I have a simple join question.
> >base = load 'input1'   USING PigStorage( ',' ) as (id1, field1, field2);
> >stats = load 'input2' USING PigStorage(',') as (id1, mean, median);
> >joined = JOIN base BY  id1, stats BY id1;
> >final = FOREACH joined GENERATE base::id1, base::field1,base::field2,
> >stats::mean,stats::median;
> >STORE final INTO   'output'   USING PigStorage( ',' );
> >
> >But something doesnt feels right.
> >Inputs are of order MB's.. whereas outputs are like 100GB's...
> >
> >I tried it on sample file
> >where base is 35MB
> >stats is 10MB
> >and output explodes to GB's??
> >What am i missing?
>
>

Re: Join question

Reply via email to