Amar, Thanks for the pointer.  

-----Original Message-----
From: Amar Kamat [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2008 8:43 PM
To: core-user@hadoop.apache.org
Subject: Re: do NOT start reduce task until all mappers are finished

Haijun Cao wrote:
> Hi,
>
>
>
> I am using 0.18.2 with fair scheduler hadoop-3476. 
>
> The purpose of fair scheduler is to prevent long running jobs
> from blocking short jobs. I gave it a try --- start a long job first,
then a
> short one. The short job is able to grab some map slot and finishes
its map
> phase quickly, but it still blocks on reduce phase. Because the long
job has
> taken all the reduce slots (because the long job starts first and its
reducers
> are started shortly after).
>  
> The long job's reducer won't finish until all its mappers
> have finished. So my short job is still blocked by the long job....
Making the
> fair scheduler useless for my workload.
>  
> I am wondering if there is a way to NOT to start reduce task
> until all its mappers have finished. 
>  
>   
https://issues.apache.org/jira/browse/HADOOP-4666 is opened to address 
something similar. Starting the reducers after all the maps are done 
might result into increased runtime of the job.  The reason for starting

the reducers along with the maps is to interleave/parallelize map and 
shuffle(data-pulling) phase since maps are typically cpu bound while 
shuffle is io bound.
Amar
> Thanks
>
> Haijun Cao
>
>
>       
>   

Reply via email to