Think of it in partition terms. If you know that your map-splits X, Y and Z won't emit any key of partition P, then the Pth reducer can jump ahead and run without those X, Y and Z completing their processing.
Otherwise, a reducer can't run until all maps have completed, in fear of losing a few keys that may have come out of the maps it has skipped fetching from. To some this may be tolerable, or some would be OK to receive it later - but thats gonna add complexity when you could just fetch continuously and wait. Should be easy to take the MRv2 application [0] and add such a thing in today, if you need it. [0] - Given the confusion between what MRv2 and YARN mean individually (they get mixed up too much), hope this blog post of mine helps: http://www.cloudera.com/blog/2012/10/mr2-and-yarn-briefly-explained/ On Sat, Oct 13, 2012 at 7:46 AM, Jay Vyas <jayunit...@gmail.com> wrote: > Is it possible for reducers to start (not just copying, but actually) > "reducing" before all mappers are done, speculatively? > > In particular im asking this because Im curious about the internals of how > the shuffle and sort might (or might not :)) be able to support this. -- Harsh J