I would like to know the answer to this question as well. The reason I could think of from a theoretical point of view is: * Each Mapper can theoretically output every possible Key. * This means that the complete set of <key, value> pairs processed by a Reducer is not known until every Mapper has finished. * As a result the Value iterator used by the reduce() method of the Reducer would need to support blocking, i.e. it needs to block until the last Mapper finishes / the last <key, value> pair is received.
Maybe this blocking is to be avoided (I would like to know the specific reason, if so) or simply a pain to realize. And hence not supported (yet?). On Fri, Dec 24, 2010 at 6:20 AM, pig <tjuhzjem...@qq.com> wrote: > excellent answer. > > For some special reduce jobs that do not rely on the order of (key,value) > pairs, the sort phase is of no use. > In this situation, theoretically speaking, reduce can be started before all > of the map task finished. > But why hadoop doesn't support this feature? For example, it may be specified > as an argument when committing a job. > > ------------------ Original ------------------ > From: "Harsh J"<qwertyman...@gmail.com>; > Date: Tue, Dec 21, 2010 03:13 PM > To: "mapreduce-user"<mapreduce-user@hadoop.apache.org>; > Subject: Re: When a Reduce Task starts? > > On Tue, Dec 21, 2010 at 7:23 AM, li ping <li.j...@gmail.com> wrote: >> I think the reduce can be started before all of the map finished. >> See the configration item in mapred-site.xml >> <property> >> ??<name>mapred.reduce.slowstart.completed.maps</name> >> ??<value>0.05</value> >> ??<description>Fraction of the number of maps in the job which should be >> ??complete before reduces are scheduled for the job. >> ??</description> >> </property> >> Correct me, if I'm wrong. > > Well it depends on what you mean by a "reduce". A ReduceTask, in > Hadoop terms, may begin as some maps complete (as configured using > mapred.reduce.slowstart.completed.maps) -- but they would only be in > the Copy phase (Not sort/reduce). > > With the current Hadoop implementation, a reduce(Key, Iterable<Value>) > will never be called until all mappers have completed. > > -- > Harsh J > www.harshj.com