Hitesh, Regarding your comments: - the files are served by an auxiliary service ( mapreduce shuffle service ) running within the NodeManager. - The NM needs to be configured to tell it which aux services to start up.
Does this mean that I could in theory write an auxiliary service, perhaps modeled after the mapreduce shuffle service, to handle such node-level tasks as serving up files? What I am trying to understand is whether my application can perform similar actions to MapReduce. I am not trying to replace MapReduce, however the ability to perform equivalent operations would be very useful to our application. For example, there are transitive closure algorithms that can be written by iterative MapReduce jobs, but which can potentially be much more efficient if they are able to avoid landing intermediate results on HDFS. Thanks John -----Original Message----- From: Hitesh Shah [mailto:[email protected]] Sent: Thursday, May 23, 2013 5:10 PM To: [email protected] Subject: Re: Custom ApplicationMaster development Hello John To add to Chris' email: Do take a look at http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html - this is probably a bit of date. - the actual source code of distributed-shell in the source tree would be the best guideline to follow after taking a brief look at the link above. Compatibility - 0.23 and 2.0 are similar to a large extent but there are differences - not sure if it is possible to code for compatibility. - To get apis into a relatively stable state, a lot of changes have gone in since 2.0.4 was released Task output files - the files are served by an auxiliary service ( mapreduce shuffle service ) running within the NodeManager. - The NM needs to be configured to tell it which aux services to start up. - The protocols support some level of information passing via the service data constructs. - the service is notified when an application completes such that it can be used to delete data if needed -- Hitesh On May 23, 2013, at 3:45 PM, John Lilley wrote: > I am getting started with development of a custom ApplicationMaster and I > didn't think that the user@ list was quite the right place for it. Apologies > if this list isn't the right place either. Some of my questions are really > newbie, like: > > * Is there an FAQ for non-MR YARN development? > > * Is there an FAQ for configuring/building/running Hadoop from > source, preferably in Eclipse? > > * What is the recommended configuration/environment for development > of a YARN app? I would like to use Eclipse under Windows if that even makes > any sense. > > * Would you start with a Hadoop release or build from version control? > > * Is it possible to code for compatibility between 2.0 and 0.23? > > * Is there an ApplicationMaster example that can be used as a > starting point? > I also have some more in-depth questions: > > * When a MapReduce task creates its output files and makes them > available over HTTP, is it the NodeManager that serves them up? If my YARN > task wants to do something similar, how does it tell the NodeManager? How > are the files removed later? > > * Is it possible to install objects or services that run as peers of > the NodeManager as opposed to tasks? Are there any recommended per-node > patterns as opposed to per-task patterns? > > Thanks > John >
