Just a follow up to see if anyone can shed some light on this: My understanding is that each block after getting replicated 3 times, a map task is run on each of the replica in parallel. The thing i am trying to double verify is in a scenario where a file is split into 10K or 100K or more blocks it will result in atleast 300K Map tasks being performed and this looks like an overkill from a performance or just a logical perspective. Will appreciate any thoughts on this. Thanks Sai
________________________________ From: Sai Sai <saigr...@yahoo.in> To: "user@hadoop.apache.org" <user@hadoop.apache.org>; Sai Sai <saigr...@yahoo.in> Sent: Friday, 12 April 2013 1:37 PM Subject: Re: Does a Map task run 3 times on 3 TTs or just once Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs in parallel and whoever completes processing the task first that output is picked up and written to intermediate location. Or is it true that a map task even though its data is replicated 3 times will run only once and other 2 will be on the stand by just incase this fails the second one will run followed by 3rd one if the 2nd Mapper fails. Plesae pour some light. Thanks Sai