Say that you have a taskSet of maps, each operating on one Hadoop
partition. How does the scheduler decide which mapTask output (i.e., a
shuffle block) goes to what reducer? Are the shuffle blocks evenly split
among reducers?


On Sun, Nov 10, 2013 at 9:50 PM, Aaron Davidson <ilike...@gmail.com> wrote:

> It is responsible for a subset of shuffle blocks. MapTasks split up their
> data, creating one shuffle block for every reducer. During the shuffle
> phase, the reducer will fetch all shuffle blocks that were intended for it.
>
>
> On Sun, Nov 10, 2013 at 9:38 PM, Umar Javed <umarj.ja...@gmail.com> wrote:
>
>> I was wondering how does the scheduler assign the ShuffledRDD locations
>> to the reduce tasks? Say that you have 4 reduce tasks, and a number of
>> shuffle blocks across two machines. Is each reduce task responsible for a
>> subset of individual keys or a subset of shuffle blocks?
>>
>> Umar
>>
>
>

Reply via email to