On Sat 23 Aug 2014 01:52:38 PM EDT, S.L wrote: > Thats what I thought too, but please check the Answer #2 here in this > question , I am facing a similar problem. > > http://stackoverflow.com/questions/12135949/why-map-task-always-running-on-a-single-node
We were having the same problem; a map with 50 tasks would result in all 50 on a single datanode (our datanodes have 64GB of memory). What I did to fix it is change the following configuration values in mapred-site.xml: mapreduce.map.memory.mb mapreduce.map.java.opts mapreduce.reduce.memory.mb mapreduce.reduce.java.opts These control the amount of memory used for maps and reduces; our machines have 12 cores, so we wanted ~16-20 tasks per node instead of the current 63 per node since "mapreduce.map.memory.mb" is by default 1024 as far as I know. If you set these values appropriately (memory in box / tasks per node), you should be good to go. Also, each of the "java.opts" should be "-Xmx##M", where ## should be the memory for the JVM in MB. Both mapreduce.map.memory.mb and mapreduce.reduce.memory.mb are 3072 in our installation, resulting in around 20 tasks per node. Please note that I'm not sure if this is the "official" solution, but I could not find a better solution since the old way of assigning a certain number of maps per node was deprecated. Also, as mentioned earlier in this thread, you do need to have enough input splits before tasks will be assigned to multiple nodes. Hope this helps, Alec