On Jan 27, 2014, at 4:17 AM, Amit Mittal <amitmitt...@gmail.com> wrote:

> Question 1: I believe the TaskTracker and then JobTracker/AppMaster will 
> receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol 
> obj). By which the JobTracker/AM will know the location of the map's o/p file 
> and host details etc, however how it will know what all the partitions or 
> keys this output has. In other words, from the heartbeat, how JobTracker will 
> know about data partitions/keys? It will be required to decide from which 
> Mapper, the mapper's output needs to be pulled or not.


Reducers pull map outputs from all the maps. So JobTracker/AppMaster simply 
give the completion events of *all* the maps to every reducer. There is no need 
for JT/AM to track the distribution of keys.


> Question 2: In short, not all reducer takes output from all Mappers, they 
> only connects and takes output related to the keys partitioned for that 
> particular reducer.


That is in a sense correct.More clearly, all Reducers get a small chunk of 
output from all Mappers.

+Vinod

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to