Are your ids unique? On 4/1/13 2:06 PM, "jamal sasha" <jamalsha...@gmail.com> wrote:
>Hi, > I have a simple join question. >base = load 'input1' USING PigStorage( ',' ) as (id1, field1, field2); >stats = load 'input2' USING PigStorage(',') as (id1, mean, median); >joined = JOIN base BY id1, stats BY id1; >final = FOREACH joined GENERATE base::id1, base::field1,base::field2, >stats::mean,stats::median; >STORE final INTO 'output' USING PigStorage( ',' ); > >But something doesnt feels right. >Inputs are of order MB's.. whereas outputs are like 100GB's... > >I tried it on sample file >where base is 35MB >stats is 10MB >and output explodes to GB's?? >What am i missing?