Hi,
Lets assume that there are two jobs J1 (100 map tasks) and J2 (200 map
tasks) and the cluster has a capacity of 150 map tasks (15 nodes with 10 map
tasks per node) and Hadoop is using the default FIFO scheduler. If I submit
first J1 and then J2, will the jobs run in parallel or the job J1 has
In most cases, your job will have more map tasks than map slots. You
want the reducers to spin up at some point before all your maps
complete, so that the shuffle and sort can work in parallel with some
of your map tasks. I usually set slow start to 80%, sometimes higher
if I know the maps are
Thanks, got the point. So, the shuffle and sort can happen in parallel even
before all the map tasks are completed, but the reduce happens only after
all the map tasks are complete.
Praveen
On Thu, Sep 22, 2011 at 7:13 PM, Joey Echeverria j...@cloudera.com wrote:
In most cases, your job will