YARN has a ShuffleHandler plugin used for MR purposes, but the APIs
used here aren't general/public so you'd have to build your own
utilities to do that. Its not too difficult to achieve but a general
API would certainly be nice.
Tez (Incubating) aims to solve some of this for users writing YARN
Thanks to previous kind answers and more reading in the elephant book, I now
understand that mapper tasks place partitioned results into local files that
are served up to reducers via HTTP:
The output file's partitions are made available to the reducers over HTTP. The
maximum number of worker