Hi, Yes. By contract, all intermediate output with the same key goes to the same reducer.
In your example, suppose of the two keys generated from the mapper, one key goes to reducer 1 and the second goes to reducer 2, reducer 3 will not have any records to process and end without producing any output. If the intermediate key space is very large, 1 reducer would certainly be a bottleneck, as you rightly note. Hence, configuring the right number of reducers would be certainly important. Thanks hemanth On 9/20/12, Jason Yang <lin.yang.ja...@gmail.com> wrote: > Hi, all > > I have a question that whether all the intermediate output with the same > key go to the same reducer or not? > > If it is, in case of only two keys are generated from mapper, but there are > 3 reducer running in this job, what would happen? > > If not, how could I do some processing over the all data, like counting? I > think some would suggest to set the number of reducer to 1, but I thought > this would make the reducer to be the bottleneck when there are large > volume of intermediate output, isn't it? > > -- > YANG, Lin >