Hi, Yeah, there was a bug in my "stats" data. I was wondering how can I calcualte average in pig.. Something like : http://stackoverflow.com/questions/12593527/finding-mean-using-pig-or-hadoop
But in top response.. it seems that the user wanted to calculate across average across all data.. as count = COUNT(inpt) and inpt is the complete input whereas what i want.. that denominator is count for each id.. so my data is like: id, value 1,1.0 1,3.0 1,5.0 2,1.0 So, the average I am expecting is: 1, 3.0 2,1.0 as 1 +3 + 5 /3 = 3 whereas in the example.. count(inpt) should give me 4? How do i achieve this. Thanks On Mon, Apr 1, 2013 at 2:24 PM, Mehmet Tepedelenlioglu <mehmets...@yahoo.com> wrote: > > Are your ids unique? > > On 4/1/13 2:06 PM, "jamal sasha" <jamalsha...@gmail.com> wrote: > > >Hi, > > I have a simple join question. > >base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2); > >stats = load 'input2' USING PigStorage(',') as (id1, mean, median); > >joined = JOIN base BY id1, stats BY id1; > >final = FOREACH joined GENERATE base::id1, base::field1,base::field2, > >stats::mean,stats::median; > >STORE final INTO 'output' USING PigStorage( ',' ); > > > >But something doesnt feels right. > >Inputs are of order MB's.. whereas outputs are like 100GB's... > > > >I tried it on sample file > >where base is 35MB > >stats is 10MB > >and output explodes to GB's?? > >What am i missing? > >