Hey John, Here's how MR works, to speak simply:
- Job.submit() is called. - Job's InputFormat#getSplits() is called, its result serialized and shipped across, along with other job artifacts such as jars, etc., to the configured FS, for the JobTracker or the MR2 ApplicationMaster for use. - The splits info contains locality hints that the scheduler then uses to assign a host's slot or resources to, depending also on availability/requested resources (hence, a 'hint', not strict). The first two are client-end (controllable), the last is dependent on the scheduler you've put in use (Fifo/Capacity/Fair) or have implemented (Custom). I'm unclear on what exactly you ask, but I think you may want to start by reading the JobSubmitter class and go around from there. Does this help? On Fri, Sep 7, 2012 at 1:24 PM, John Cuffney <[email protected]> wrote: > Hey, > > Which class handles the top level partitioning for MapReduce? It's possible > I have a misunderstanding of how this is handled, but in my view, there is a > top level controller which kicks off the whole process; it handles > partitioning of the input and distribution of the input segments to the > various machines/tasks. I have been searching through a lot of the Job > classes, and they all seem to handle a single task, whereas it is important > for me to perform some work at the highest level controller, if that exists. > Any info on what I'm looking for/if I'm on the wrong track would be much > appreciated. > > Thanks for the help, > John -- Harsh J
