On Jan 27, 2014, at 4:17 AM, Amit Mittal <amitmitt...@gmail.com> wrote:
> Question 1: I believe the TaskTracker and then JobTracker/AppMaster will > receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol > obj). By which the JobTracker/AM will know the location of the map's o/p file > and host details etc, however how it will know what all the partitions or > keys this output has. In other words, from the heartbeat, how JobTracker will > know about data partitions/keys? It will be required to decide from which > Mapper, the mapper's output needs to be pulled or not. Reducers pull map outputs from all the maps. So JobTracker/AppMaster simply give the completion events of *all* the maps to every reducer. There is no need for JT/AM to track the distribution of keys. > Question 2: In short, not all reducer takes output from all Mappers, they > only connects and takes output related to the keys partitioned for that > particular reducer. That is in a sense correct.More clearly, all Reducers get a small chunk of output from all Mappers. +Vinod -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
signature.asc
Description: Message signed with OpenPGP using GPGMail