Great, that makes a lot of sense now! Thanks a lot Harsh!
A related question: what does REDUCE_SHUFFLE_BYTES represent? Is it the size
of the sorted output of the shuffle phase?
Thanks,
Virajith
On Wed, Jun 29, 2011 at 2:10 PM, Harsh J wrote:
> Virajith,
>
> The FILE_BYTES_READ also counts all
Virajith,
The FILE_BYTES_READ also counts all the reads of spilled records done
during sorting of the various outputs between the MR phases.
On Wed, Jun 29, 2011 at 6:30 PM, Virajith Jalaparti
wrote:
> I would like to clarify my earlier question: I found that each reducer
> reports FILE_BYTES_RE
I would like to clarify my earlier question: I found that each reducer
reports FILE_BYTES_READ as around 78GB and HDFS_BYTES_WRITTEN as 25GB and
REDUCE_SHUFFLE_BYTES as 25GB. So, why is the FILE_BYTES_READ 78GB and not
just 25GB?
Thanks,
Virajith
On Wed, Jun 29, 2011 at 10:29 AM, Virajith Jalapa
Hi,
I was running the Sort example in Hadoop 0.20.2 (hadoop-0.20.2-examples.jar)
over an input data size of 100GB (generated using randomwriter) with
800mappers (I was using 128MB of HDFS block size) and 4 reducers over a 3
machine cluster with 2 slave nodes. While the input and output were 100GB,