your question is answered here http://wiki.apache.org/hadoop/HowManyMapsAndReduces
To answer first part of your question, it is not mandatory to run all the maps of a given job at a single time. Maps are executed as and when the map slots are available on the tasktrackers On Fri, Apr 12, 2013 at 1:51 PM, Sai Sai <saigr...@yahoo.in> wrote: > If we have a 640 MB data file and have 3 Data Nodes in a cluster. > The file can be split into 10 Blocks and starts the Mappers M1, M2, M3 > first. > As each one completes the task M4 and so on will be run. > It appears like it is not necessary to run all the 10 Map tasks in > parallel at once. > Just wondering if this is right assumption. > What if we have 10 TB of data file with 3 Data Nodes, how to find the > number of mappers that will be created. > Thanks > Sai > -- Nitin Pawar