Hi,

I was tuning mapred job to reduce number of spills and reached a stage where 
following numbers are same -

Spilled Records in map = Spilled records in reduce = Combine output Records = 
Reduce Input Records


I do not see any lines in mapper logs with following strings -
1. Spilling map output: record full
2. Spilling map output: buffer full

Only these strings -
1. Finished spill 0 ( Note 0 at the end )

I am confused and can someone please explain what's going on ?

1. Though neither buffer nor record got full yet there are spills ? Is it that 
mapper writing records at the end to be consumed by reducer that's why I see 
these spills ?
2. Why is combiner running if there were no spills ? If my guess is correct in 
point 1 then, will combiner not run if number of mappers < 
min.num.spills.for.combine ?
3. Why spills are counted in reducer stats ?
4. Is there way that I can tell mapper not to write final output to disk and 
reducers fetch the data from mapper's main memory ?



Regards,
Ajay Srivastava 

Reply via email to