Amar, Thanks for the pointer. -----Original Message----- From: Amar Kamat [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2008 8:43 PM To: core-user@hadoop.apache.org Subject: Re: do NOT start reduce task until all mappers are finished
Haijun Cao wrote: > Hi, > > > > I am using 0.18.2 with fair scheduler hadoop-3476. > > The purpose of fair scheduler is to prevent long running jobs > from blocking short jobs. I gave it a try --- start a long job first, then a > short one. The short job is able to grab some map slot and finishes its map > phase quickly, but it still blocks on reduce phase. Because the long job has > taken all the reduce slots (because the long job starts first and its reducers > are started shortly after). > > The long job's reducer won't finish until all its mappers > have finished. So my short job is still blocked by the long job.... Making the > fair scheduler useless for my workload. > > I am wondering if there is a way to NOT to start reduce task > until all its mappers have finished. > > https://issues.apache.org/jira/browse/HADOOP-4666 is opened to address something similar. Starting the reducers after all the maps are done might result into increased runtime of the job. The reason for starting the reducers along with the maps is to interleave/parallelize map and shuffle(data-pulling) phase since maps are typically cpu bound while shuffle is io bound. Amar > Thanks > > Haijun Cao > > > >