RE: Custom ApplicationMaster development

John Lilley Fri, 24 May 2013 13:34:05 -0700

Hitesh,

Regarding your comments:
  - the files are served by an auxiliary service ( mapreduce shuffle service ) 
running within the NodeManager. 
  - The NM needs to be configured to tell it which aux services to start up.


Does this mean that I could in theory write an auxiliary service, perhaps 
modeled after the mapreduce shuffle service, to handle such node-level tasks as 
serving up files?  What I am trying to understand is whether my application can 
perform similar actions to MapReduce.  I am not trying to replace MapReduce, 
however the ability to perform equivalent operations would be very useful to 
our application.  For example, there are transitive closure algorithms that can 
be written by iterative MapReduce jobs, but which can potentially be much more 
efficient if they are able to avoid landing intermediate results on HDFS.

Thanks
John


-----Original Message-----
From: Hitesh Shah [mailto:[email protected]] 
Sent: Thursday, May 23, 2013 5:10 PM
To: [email protected]
Subject: Re: Custom ApplicationMaster development

Hello John

To add to Chris' email:

Do take a look at 
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
   - this is probably a bit of date. 
   - the actual source code of distributed-shell in the source tree would be 
the best guideline to follow after taking a brief look at the link above.

Compatibility
  - 0.23 and 2.0 are similar to a large extent but there are differences - not 
sure if it is possible to code for compatibility.
  - To get apis into a relatively stable state, a lot of changes have gone in 
since 2.0.4 was released

Task output files
  - the files are served by an auxiliary service ( mapreduce shuffle service ) 
running within the NodeManager. 
  - The NM needs to be configured to tell it which aux services to start up.
  - The protocols support some level of information passing via the service 
data constructs. 
  - the service is notified when an application completes such that it can be 
used to delete data if needed

-- Hitesh


On May 23, 2013, at 3:45 PM, John Lilley wrote:

> I am getting started with development of a custom ApplicationMaster and I 
> didn't think that the user@ list was quite the right place for it.  Apologies 
> if this list isn't the right place either.  Some of my questions are really 
> newbie, like:
> 
> *         Is there an FAQ for non-MR YARN development?
> 
> *         Is there an FAQ for configuring/building/running Hadoop from 
> source, preferably in Eclipse?
> 
> *         What is the recommended configuration/environment for development 
> of a YARN app?  I would like to use Eclipse under Windows if that even makes 
> any sense.
> 
> *         Would you start with a Hadoop release or build from version control?
> 
> *         Is it possible to code for compatibility between 2.0 and 0.23?
> 
> *         Is there an ApplicationMaster example that can be used as a 
> starting point?
> I also have some more in-depth questions:
> 
> *         When a MapReduce task creates its output files and makes them 
> available over HTTP, is it the NodeManager that serves them up?  If my YARN 
> task wants to do something similar, how does it tell the NodeManager?  How 
> are the files removed later?
> 
> *         Is it possible to install objects or services that run as peers of 
> the NodeManager as opposed to tasks?  Are there any recommended per-node 
> patterns as opposed to per-task patterns?
> 
> Thanks
> John
>

RE: Custom ApplicationMaster development

Reply via email to