"and after trying it on several datanodes in the end it failes" Default task attempts = 4?
1. It's better to provde logs 2. Do you use any "balancing" properties, for eaxmple pig.exec.reducers.bytes.per.reducer ? I suppose you have unbalanced data 2014/1/10 Zebeljan, Nebojsa <nebojsa.zebel...@adtech.com> > Hi, > I'm encountering for a "simple" pig script, spilling issues. All map tasks > and reducers succeed pretty fast except the last reducer! > The last reducer always starts spilling after ~10mins and after trying it > on several datanodes in the end it failes. > > Do you have any idea, how I could optimize the GROUP BY, so I don't run > into spilling issues. > > Thanks in advance! > > Below the pig script: > ### > dataImport = LOAD <some data>; > generatedData = FOREACH dataImport GENERATE Field_A, Field_B, Field_C; > groupedData = GROUP generatedData BY (Field_B, Field_C); > > result = FOREACH groupedData { > counter_1 = FILTER generatedData BY <some fields>; > counter_2 = FILTER generatedData BY <some fields>; > GENERATE > group.Field_B, > group.Field_C, > COUNT(counter_1), > COUNT(counter_2); > } > > STORE result INTO <some path> USING PigStorage(); > ### > > Regards, > Nebo >