Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Hi, Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map tasks per node) and Hadoop is using the default FIFO scheduler. If I submit first J1 and then J2, will the jobs run in parallel or the job J1 has

Re: Regarding FIFO scheduler

2011-09-22 Thread Joey Echeverria
In most cases, your job will have more map tasks than map slots. You want the reducers to spin up at some point before all your maps complete, so that the shuffle and sort can work in parallel with some of your map tasks. I usually set slow start to 80%, sometimes higher if I know the maps are

Re: Regarding FIFO scheduler

2011-09-22 Thread Praveen Sripati
Thanks, got the point. So, the shuffle and sort can happen in parallel even before all the map tasks are completed, but the reduce happens only after all the map tasks are complete. Praveen On Thu, Sep 22, 2011 at 7:13 PM, Joey Echeverria j...@cloudera.com wrote: In most cases, your job will