Todd:

1. Map Task spits out key,value pairs in sorted order.
2. Shuffle is actually copy phase in Reduce Task.
3. Then Reduce task performs merge operation on the Map output intermediate
key/value pairs.
4. Reduce Task builds the iterable list of values for each key.

I was trying to understand which method does this in Reduce task. I'll come
back to you. (I think base code is in Task.java)

Regards,
Chary
On Sun, Dec 21, 2014 at 9:40 PM, Susheel Kumar Gadalay <skgada...@gmail.com>
wrote:

> What I explained is shuffle phase.
>
> After the reducer pulls the data, it does a sort on the key part only
> and calls the corresponding reduce method.
> On 12/22/14, bit1...@163.com <bit1...@163.com> wrote:
> > Then what exactly happens after Reducer pulls all mapper output key/value
> > pairs from all the mapper nodes before reducer see the
> > <key,value1,value2..>?
> >
> >
> >
>  > bit1...@163.com
> >
> > From: Susheel Kumar Gadalay
> > Date: 2014-12-22 13:20
> > To: user
> > Subject: Re: Question about shuffle/merge/sort phrase
> > Sorry, typo
> >
> > It is the reducer which will pull the mapper o/p as soon as it completes.
> >
> > On 12/22/14, Susheel Kumar Gadalay <skgada...@gmail.com> wrote:
> >> It is the mapper which will push the o/p to the respective reducer as
> >> soon as it completes.
> >>
> >> The no of reducers are known at the beginning itself.
> >> The mapper as it process the input split, generate the o/p of for each
> >> reducer (if the mapper o/p key is eligible for the reducer).
> >> The reducer will wait till the completion of all map tasks to start it
> >> processing.
> >>
> >>
> >> On 12/22/14, bit1...@163.com <bit1...@163.com> wrote:
> >>> Could some one help me on this question? thanks.
> >>>
> >>>
> >>>
> >>> bit1...@163.com
> >>>
> >>> 发件人: Todd
> >>> 发送时间: 2014-12-21 21:59
> >>> 收件人: user@hadoop.apache.org
> >>> 主题: Question about shuffle/merge/sort phrase
> >>> Hi, Hadoopers,
> >>> I got a question about shuffle/sort/merge phrase related..
> >>> My understanding is that shuffle is used to transfer the mapper
> >>> output(key/value pairs) from mapper node to reducer node, and merge
> >>> phrase
> >>> is used to merge all the mapper output from all mapper nodes, and sort
> >>> phrase is used to sort the key/value pair by key,
> >>> Then my question, whose responsibility is it that brings each key with
> >>> all
> >>> its values together (The reducer's input is a key and an iterative
> >>> values).
> >>>
> >>>
> >>> Thanks.
> >>>
> >>
> >
>

Reply via email to