Hi John, Yes - you probably could.
I don't know of anyone that has written any other auxiliary service till date so if you come across anything lacking in the handling/support of aux services, please do file feature-request/bug jiras. For the application that you mentioned, I am assuming you are looking to build some form of a data 'caching' service that can store a job's output to be used by subsequent jobs? -- Hitesh On May 24, 2013, at 1:33 PM, John Lilley wrote: > Hitesh, > > Regarding your comments: > - the files are served by an auxiliary service ( mapreduce shuffle service ) > running within the NodeManager. > - The NM needs to be configured to tell it which aux services to start up. > > Does this mean that I could in theory write an auxiliary service, perhaps > modeled after the mapreduce shuffle service, to handle such node-level tasks > as serving up files? What I am trying to understand is whether my > application can perform similar actions to MapReduce. I am not trying to > replace MapReduce, however the ability to perform equivalent operations would > be very useful to our application. For example, there are transitive closure > algorithms that can be written by iterative MapReduce jobs, but which can > potentially be much more efficient if they are able to avoid landing > intermediate results on HDFS. > > Thanks > John > > > -----Original Message----- > From: Hitesh Shah [mailto:[email protected]] > Sent: Thursday, May 23, 2013 5:10 PM > To: [email protected] > Subject: Re: Custom ApplicationMaster development > > Hello John > > To add to Chris' email: > > Do take a look at > http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html > - this is probably a bit of date. > - the actual source code of distributed-shell in the source tree would be > the best guideline to follow after taking a brief look at the link above. > > Compatibility > - 0.23 and 2.0 are similar to a large extent but there are differences - not > sure if it is possible to code for compatibility. > - To get apis into a relatively stable state, a lot of changes have gone in > since 2.0.4 was released > > Task output files > - the files are served by an auxiliary service ( mapreduce shuffle service ) > running within the NodeManager. > - The NM needs to be configured to tell it which aux services to start up. > - The protocols support some level of information passing via the service > data constructs. > - the service is notified when an application completes such that it can be > used to delete data if needed > > -- Hitesh > > > On May 23, 2013, at 3:45 PM, John Lilley wrote: > >> I am getting started with development of a custom ApplicationMaster and I >> didn't think that the user@ list was quite the right place for it. >> Apologies if this list isn't the right place either. Some of my questions >> are really newbie, like: >> >> * Is there an FAQ for non-MR YARN development? >> >> * Is there an FAQ for configuring/building/running Hadoop from >> source, preferably in Eclipse? >> >> * What is the recommended configuration/environment for development >> of a YARN app? I would like to use Eclipse under Windows if that even makes >> any sense. >> >> * Would you start with a Hadoop release or build from version >> control? >> >> * Is it possible to code for compatibility between 2.0 and 0.23? >> >> * Is there an ApplicationMaster example that can be used as a >> starting point? >> I also have some more in-depth questions: >> >> * When a MapReduce task creates its output files and makes them >> available over HTTP, is it the NodeManager that serves them up? If my YARN >> task wants to do something similar, how does it tell the NodeManager? How >> are the files removed later? >> >> * Is it possible to install objects or services that run as peers of >> the NodeManager as opposed to tasks? Are there any recommended per-node >> patterns as opposed to per-task patterns? >> >> Thanks >> John >> >
