Like you said, it depends both on the kind of network you have and the type of
your workload.
Given your point about S3, I'd guess your input files/blocks are not large
enough that moving code to data trumps moving data itself to the code. When
that balance tilts a lot, especially when moving
Hi Mike
Data locality has an assumption. It assumes storage access (disk, ssd, etc)
is faster than network data transferring. Vinod has already explained the
benefits. But locality in map stage may not always bring good things. If a
fat node saves a large file, it is possible that current MR