Locality when placing Map tasks

Esteban Molina-Estolano Sat, 03 Oct 2009 01:31:50 -0700

Hi,

I'm running Hadoop 0.19.1 on 19 nodes. I've been benchmarking a Hadoopworkload with 115 Map tasks, on two different distributed filesystems(KFS and PVFS); in some tests, I also have a write-intensive non-Hadoopjob running in the background (an HPC checkpointing benchmark). I'vefound that Hadoop sometimes makes most of the Map tasks data-local, andsometimes makes none of the Map tasks data-local; this depends both onwhich filesystem I use, and on whether the background task is running.(I never run multiple Hadoop jobs concurrently in these tests.)

I'd like to learn how the Hadoop scheduler places Map tasks, and howlocality is taken into account, so I can figure out why this ishappening. (I'm using the default FIFO scheduler.) Is there somedocumentation available that would explain this?


Thanks!

Locality when placing Map tasks

Reply via email to