Re: basic doubt on number of reduce tasks

Bejoy Ks Fri, 02 Mar 2012 02:43:09 -0800

Vamshi
        If you have set the number of reduce slots in a node to 5 and if
you have 4 nodes, then your cluster can run a max of 5*4 = 20 reduce tasks
at a time. If more reduce tasks are present those has to wait till
reduce slots becomes available.
       In reducer the data locality is not considered,reducer tasks are
triggered on nodes in random, if there are free slots available in there.
There is no guarantee that all nodes would have same number of reducers
running at a time. Mappers consider data locality but it is hard to
determine that on a reducer as a reducer input would be the output
from multiple mappers across cluster.


Regards
Bejoy.KS

On Fri, Mar 2, 2012 at 3:39 PM, Vamshi Krishna <vamshi2...@gmail.com> wrote:

> Hi all,
> Consider in hadoop cluster having 4 nodes, and in every node the maximum
> no.of reduce slots fixed at 5. When mapreduce deamons started,
>
> 1) Is there any restriction on no. of simultaneously running reduce tasks
> on all nodes such as it should be same on all nodes? OR
>
> 2)Is it like this: A node where there is lot of data to be processed, on
> that node higher number of reduce tasks will run than the node where less
> amount of data present.That is, according to the size of data to be
> processed on a particular node, proportionate number of reduce tasks will
> be run on different nodes.
>
> please some body clarify this basic doubt .. which is correct? If none,
> what is the actual process that takes place
>
> --
> *Regards*
> *
> Vamshi Krishna
> *
>
>

Re: basic doubt on number of reduce tasks

Reply via email to