Hi,

I'm running Hadoop 0.19.1 on 19 nodes. I've been benchmarking a Hadoop workload with 115 Map tasks, on two different distributed filesystems (KFS and PVFS); in some tests, I also have a write-intensive non-Hadoop job running in the background (an HPC checkpointing benchmark). I've found that Hadoop sometimes makes most of the Map tasks data-local, and sometimes makes none of the Map tasks data-local; this depends both on which filesystem I use, and on whether the background task is running. (I never run multiple Hadoop jobs concurrently in these tests.)

I'd like to learn how the Hadoop scheduler places Map tasks, and how locality is taken into account, so I can figure out why this is happening. (I'm using the default FIFO scheduler.) Is there some documentation available that would explain this?

Thanks!

Reply via email to