Re: Intermediate data size of Sort example

2011-06-29 Thread Virajith Jalaparti
Great, that makes a lot of sense now! Thanks a lot Harsh! A related question: what does REDUCE_SHUFFLE_BYTES represent? Is it the size of the sorted output of the shuffle phase? Thanks, Virajith On Wed, Jun 29, 2011 at 2:10 PM, Harsh J wrote: > Virajith, > > The FILE_BYTES_READ also counts all

Re: Intermediate data size of Sort example

2011-06-29 Thread Harsh J
Virajith, The FILE_BYTES_READ also counts all the reads of spilled records done during sorting of the various outputs between the MR phases. On Wed, Jun 29, 2011 at 6:30 PM, Virajith Jalaparti wrote: > I would like to clarify my earlier question: I found that each reducer > reports FILE_BYTES_RE

Re: Intermediate data size of Sort example

2011-06-29 Thread Virajith Jalaparti
I would like to clarify my earlier question: I found that each reducer reports FILE_BYTES_READ as around 78GB and HDFS_BYTES_WRITTEN as 25GB and REDUCE_SHUFFLE_BYTES as 25GB. So, why is the FILE_BYTES_READ 78GB and not just 25GB? Thanks, Virajith On Wed, Jun 29, 2011 at 10:29 AM, Virajith Jalapa

Intermediate data size of Sort example

2011-06-29 Thread Virajith Jalaparti
Hi, I was running the Sort example in Hadoop 0.20.2 (hadoop-0.20.2-examples.jar) over an input data size of 100GB (generated using randomwriter) with 800mappers (I was using 128MB of HDFS block size) and 4 reducers over a 3 machine cluster with 2 slave nodes. While the input and output were 100GB,