I would like to know the answer to this question as well.

The reason I could think of from a theoretical point of view is:
* Each Mapper can theoretically output every possible Key.
* This means that the complete set of <key, value> pairs processed by
a Reducer is not known until every Mapper has finished.
* As a result the Value iterator used by the reduce() method of the
Reducer would need to support blocking, i.e. it needs to block until
the last Mapper finishes / the last <key, value> pair is received.

Maybe this blocking is to be avoided (I would like to know the
specific reason, if so) or simply a pain to realize.
And hence not supported (yet?).

On Fri, Dec 24, 2010 at 6:20 AM, pig <tjuhzjem...@qq.com> wrote:
> excellent answer.
>
> For some special reduce jobs that do not rely on the order of (key,value) 
> pairs,  the sort phase is of no use.
> In this situation, theoretically speaking, reduce can be started before all 
> of the map task finished.
> But why hadoop doesn't support this feature? For example, it may be specified 
> as an argument when committing a job.
>
> ------------------ Original ------------------
> From:  "Harsh J"<qwertyman...@gmail.com>;
> Date:  Tue, Dec 21, 2010 03:13 PM
> To:  "mapreduce-user"<mapreduce-user@hadoop.apache.org>;
> Subject:  Re: When a Reduce Task starts?
>
> On Tue, Dec 21, 2010 at 7:23 AM, li ping <li.j...@gmail.com> wrote:
>> I think the reduce can be started before all of the map finished.
>> See the configration item in mapred-site.xml
>> <property>
>> ??<name>mapred.reduce.slowstart.completed.maps</name>
>> ??<value>0.05</value>
>> ??<description>Fraction of the number of maps in the job which should be
>> ??complete before reduces are scheduled for the job.
>> ??</description>
>> </property>
>> Correct me, if I'm wrong.
>
> Well it depends on what you mean by a "reduce". A ReduceTask, in
> Hadoop terms, may begin as some maps complete (as configured using
> mapred.reduce.slowstart.completed.maps) -- but they would only be in
> the Copy phase (Not sort/reduce).
>
> With the current Hadoop implementation, a reduce(Key, Iterable<Value>)
> will never be called until all mappers have completed.
>
> --
> Harsh J
> www.harshj.com

Reply via email to