Re: HTTP file server, map output, and other files

2013-05-24 Thread Harsh J
YARN has a ShuffleHandler plugin used for MR purposes, but the APIs used here aren't general/public so you'd have to build your own utilities to do that. Its not too difficult to achieve but a general API would certainly be nice. Tez (Incubating) aims to solve some of this for users writing YARN

HTTP file server, map output, and other files

2013-05-23 Thread John Lilley
Thanks to previous kind answers and more reading in the elephant book, I now understand that mapper tasks place partitioned results into local files that are served up to reducers via HTTP: The output file's partitions are made available to the reducers over HTTP. The maximum number of worker