Hi >> If not, how could I do some processing over the all data, like counting? << Maybe you can refer to the teraSort example in hadoop. it use a partitioner that splits text keys into roughly equal partitions in a global sorted order.
On Thu, Sep 20, 2012 at 9:28 PM, Hemanth Yamijala <yhema...@thoughtworks.com > wrote: > Hi, > > Yes. By contract, all intermediate output with the same key goes to > the same reducer. > > In your example, suppose of the two keys generated from the mapper, > one key goes to reducer 1 and the second goes to reducer 2, reducer 3 > will not have any records to process and end without producing any > output. > > If the intermediate key space is very large, 1 reducer would certainly > be a bottleneck, as you rightly note. Hence, configuring the right > number of reducers would be certainly important. > > Thanks > hemanth > > On 9/20/12, Jason Yang <lin.yang.ja...@gmail.com> wrote: > > Hi, all > > > > I have a question that whether all the intermediate output with the same > > key go to the same reducer or not? > > > > If it is, in case of only two keys are generated from mapper, but there > are > > 3 reducer running in this job, what would happen? > > > > If not, how could I do some processing over the all data, like counting? > I > > think some would suggest to set the number of reducer to 1, but I thought > > this would make the reducer to be the bottleneck when there are large > > volume of intermediate output, isn't it? > > > > -- > > YANG, Lin > > > -- Don't Grow Old, Grow Up... :-)