Hi,

Yes. By contract, all intermediate output with the same key goes to
the same reducer.

In your example, suppose of the two keys generated from the mapper,
one key goes to reducer 1 and the second goes to reducer 2, reducer 3
will not have any records to process and end without producing any
output.

If the intermediate key space is very large, 1 reducer would certainly
be a bottleneck, as you rightly note. Hence, configuring the right
number of reducers would be certainly important.

Thanks
hemanth

On 9/20/12, Jason Yang <lin.yang.ja...@gmail.com> wrote:
> Hi, all
>
> I have a question that whether all the intermediate output with the same
> key go to the same reducer or not?
>
> If it is, in case of only two keys are generated from mapper, but there are
> 3 reducer running in this job, what would happen?
>
> If not, how could I do some processing over the all data, like counting? I
> think some would suggest to set the number of reducer to 1, but I thought
> this would make the reducer to be the bottleneck when there are large
> volume of intermediate output, isn't it?
>
> --
> YANG, Lin
>

Reply via email to