Vamshi If you have set the number of reduce slots in a node to 5 and if you have 4 nodes, then your cluster can run a max of 5*4 = 20 reduce tasks at a time. If more reduce tasks are present those has to wait till reduce slots becomes available. In reducer the data locality is not considered,reducer tasks are triggered on nodes in random, if there are free slots available in there. There is no guarantee that all nodes would have same number of reducers running at a time. Mappers consider data locality but it is hard to determine that on a reducer as a reducer input would be the output from multiple mappers across cluster.
Regards Bejoy.KS On Fri, Mar 2, 2012 at 3:39 PM, Vamshi Krishna <vamshi2...@gmail.com> wrote: > Hi all, > Consider in hadoop cluster having 4 nodes, and in every node the maximum > no.of reduce slots fixed at 5. When mapreduce deamons started, > > 1) Is there any restriction on no. of simultaneously running reduce tasks > on all nodes such as it should be same on all nodes? OR > > 2)Is it like this: A node where there is lot of data to be processed, on > that node higher number of reduce tasks will run than the node where less > amount of data present.That is, according to the size of data to be > processed on a particular node, proportionate number of reduce tasks will > be run on different nodes. > > please some body clarify this basic doubt .. which is correct? If none, > what is the actual process that takes place > > -- > *Regards* > * > Vamshi Krishna > * > >