Hi

>>
If not, how could I do some processing over the all data, like counting?
<<
Maybe you can refer to the teraSort example in hadoop. it use a  partitioner
that splits text keys into roughly equal partitions in a global sorted
order.

On Thu, Sep 20, 2012 at 9:28 PM, Hemanth Yamijala <yhema...@thoughtworks.com
> wrote:

> Hi,
>
> Yes. By contract, all intermediate output with the same key goes to
> the same reducer.
>
> In your example, suppose of the two keys generated from the mapper,
> one key goes to reducer 1 and the second goes to reducer 2, reducer 3
> will not have any records to process and end without producing any
> output.
>
> If the intermediate key space is very large, 1 reducer would certainly
> be a bottleneck, as you rightly note. Hence, configuring the right
> number of reducers would be certainly important.
>
> Thanks
> hemanth
>
> On 9/20/12, Jason Yang <lin.yang.ja...@gmail.com> wrote:
> > Hi, all
> >
> > I have a question that whether all the intermediate output with the same
> > key go to the same reducer or not?
> >
> > If it is, in case of only two keys are generated from mapper, but there
> are
> > 3 reducer running in this job, what would happen?
> >
> > If not, how could I do some processing over the all data, like counting?
> I
> > think some would suggest to set the number of reducer to 1, but I thought
> > this would make the reducer to be the bottleneck when there are large
> > volume of intermediate output, isn't it?
> >
> > --
> > YANG, Lin
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Reply via email to