Thank you very much for answering my question. Is there any publicly available Hadoop-MR-Yarn UML diagrams (class, activity etc), or some more in-depth documentation, except the one on official site. I am interested in implementation details/documentation of MR AM and MR containers (old TaskTracker)?
regards blah 2013/2/1 Vinod Kumar Vavilapalli <[email protected]> > You got that mostly right. And it doesn't differ much in Hadoop 1.* > either. With MR AM doing the work that was earlier done in JobTracker., the > JobClient and the task side doesn't change much. > > FileInputFormat.getsplits() is called by client itself, so you should look > for logs on the client machine. > > Each filesystem overrides getFileBlockLocations() and provides the correct > locations - like DFS internally uses the getBlockLocations() API on > Namenode. What you are seeing is the default implementation for local FS. > > HTH, > +Vinod > > > > On Fri, Feb 1, 2013 at 6:24 AM, blah blah <[email protected]> wrote: > >> Hi >> >> (I am using Yarn Hadoop-3.0.0.SNAPSHOT, revision 1437315M) >> >> I have a question regarding my assumptions on the Yarn-MR design, >> specially the InputSplit processing. Can someone confirm or point out my >> mistakes in my MR-Yarn design assumptions? >> >> These are my assumptions regarding design. >> 1. JobClient submits Job >> Create AppMaster etc. >> 2. Get number of splits // MR-AM, specially their hosts, so that a Task >> can be started on the same node, use *InputFormat.getSplts() { ...; >> FileSystem.getFileBlockLocations(); ...;} >> 3. Start N tasks // MR-AM >> 4. Each Task processes its (single) split (unless splitsNr >> tasksNr) >> with the use of InputFormat/RecordReader // MR-Task, from HERE InputFormat >> operates only on a single Split >> 5. Start RecordReader and process Split // MR-Task >> 5. MAP() // MR-Task >> 6. Do rest MR // MR-Task >> 7. Dump to HDFS/or other storage. // MR-Task >> 8. Report FINISH, free resources // MR-AM >> >> 2 quick bonus questions >> >> I have added additional log entry in the FileInputFormat.getSplits(), >> however I can not see it in log files. I am using WordCount example and >> INFO level. What might be the problem? >> In the FileSystem.getFileBlockLocations() the hostname is hard-coded as >> "localhost", where this is mapped to the actual host name, so that AM will >> know which nodes to request? >> >> Thanks for reply >> > > > > -- > +Vinod > Hortonworks Inc. > http://hortonworks.com/ >
