Harsh, Thanks as usual for your sage advice. I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
FWIW, I would vote to be able to load YARN services from HDFS. What is the appropriate forum to file a request like that? Thanks John -----Original Message----- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, September 04, 2013 12:05 AM To: <user@hadoop.apache.org> Subject: Re: yarn-site.xml and aux-services > Thanks for the clarification. I would find it very convenient in this case > to have my custom jars available in HDFS, but I can see the added complexity > needed for YARN to maintain cache those to local disk. We could class-load directly from HDFS, like HBase Co-Processors do. > Consider a scenario analogous to the MR shuffle, where the persistent service > serves up mapper output files to the reducers across the network: Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic. On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <john.lil...@redpoint.net> wrote: > Harsh, > > Thanks for the clarification. I would find it very convenient in this case > to have my custom jars available in HDFS, but I can see the added complexity > needed for YARN to maintain cache those to local disk. > > What about having the tasks themselves start the per-node service as a child > process? I've been told that the NM kills the process group, but won't > setgrp() circumvent that? > > Even given that, would the child process of one task have proper environment > and permission to act on behalf of other tasks? Consider a scenario > analogous to the MR shuffle, where the persistent service serves up mapper > output files to the reducers across the network: > 1) AM spawns "mapper-like" tasks around the cluster > 2) Each mapper-like task on a given node launches a "persistent service" > child, but only if one is not already running. > 3) Each mapper-like task writes one or more output files, and informs the > service of those files (along with AM-id, Task-id etc). > 4) AM spawns "reducer-like" tasks around the cluster. > 5) Each reducer-like task is told which nodes contain "mapper" result data, > and connects to services on those nodes to read the data. > > There are some details missing, like how the lifetime of the temporary files > is controlled to extend beyond the mapper-like task lifetime but still be > cleaned up on AM exit, and how the reducer-like tasks are informed of which > nodes have data. > > John > > > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Friday, August 23, 2013 11:00 AM > To: <user@hadoop.apache.org> > Subject: Re: yarn-site.xml and aux-services > > The general practice is to install your deps into a custom location such as > /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also > configuring the classes under the aux-services list. You need to take care of > deploying jar versions to /opt/john-jars/ contents across the cluster though. > > I think it may be a neat idea to have jars be placed on HDFS or any other > DFS, and the yarn-site.xml indicating the location plus class to load. > Similar to HBase co-processors. But I'll defer to Vinod on if this would be a > good thing to do. > > (I know the right next thing with such an ability people will ask for > is hot-code-upgrades...) > > On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <john.lil...@redpoint.net> > wrote: >> Are there recommended conventions for adding additional code to a >> stock Hadoop install? >> >> It would be nice if we could piggyback on whatever mechanisms are >> used to distribute hadoop itself around the cluster. >> >> john >> >> >> >> From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] >> Sent: Thursday, August 22, 2013 6:25 PM >> >> >> To: user@hadoop.apache.org >> Subject: Re: yarn-site.xml and aux-services >> >> >> >> >> >> Auxiliary services are essentially administer-configured services. >> So, they have to be set up at install time - before NM is started. >> >> >> >> +Vinod >> >> >> >> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley >> <john.lil...@redpoint.net> >> wrote: >> >> Following up on this, how exactly does one *install* the jar(s) for >> auxiliary service? Can it be shipped out with the LocalResources of an AM? >> MapReduce's aux-service is presumably installed with Hadoop and is >> just sitting there in the right place, but if one wanted to make a >> whole new aux-service that belonged with an AM, how would one do it? >> >> John >> >> >> -----Original Message----- >> From: John Lilley [mailto:john.lil...@redpoint.net] >> Sent: Wednesday, June 05, 2013 11:41 AM >> To: user@hadoop.apache.org >> Subject: RE: yarn-site.xml and aux-services >> >> Wow, thanks. Is this documented anywhere other than the code? I >> hate to waste y'alls time on things that can be RTFMed. >> John >> >> >> -----Original Message----- >> From: Harsh J [mailto:ha...@cloudera.com] >> Sent: Wednesday, June 05, 2013 9:35 AM >> To: <user@hadoop.apache.org> >> Subject: Re: yarn-site.xml and aux-services >> >> John, >> >> The format is ID and sub-config based: >> >> First, you define an ID as a service, like the string "foo". This is >> the ID the applications may lookup in their container responses map >> we discussed over another thread (around shuffle handler). >> >> <property> >> <name>yarn.nodemanager.aux-services</name> >> <value>foo</value> >> </property> >> >> Then you define an actual implementation class for that ID "foo", like so: >> >> <property> >> <name>yarn.nodemanager.aux-services.foo.class</name> >> <value>com.mypack.MyAuxServiceClassForFoo</value> >> </property> >> >> If you have multiple services foo and bar, then it would appear like >> the below (comma separated IDs and individual configs): >> >> <property> >> <name>yarn.nodemanager.aux-services</name> >> <value>foo,bar</value> >> </property> >> <property> >> <name>yarn.nodemanager.aux-services.foo.class</name> >> <value>com.mypack.MyAuxServiceClassForFoo</value> >> </property> >> <property> >> <name>yarn.nodemanager.aux-services.bar.class</name> >> <value>com.mypack.MyAuxServiceClassForBar</value> >> </property> >> >> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley >> <john.lil...@redpoint.net> >> wrote: >>> Good, I was hoping that would be the case. But what are the >>> mechanics of it? Do I just add another entry? And what exactly is >>> "madreduce.shuffle"? >>> A scoped class name? Or a key string into some map elsewhere? >>> >>> e.g. like: >>> >>> <property> >>> <name>yarn.nodemanager.aux-services</name> >>> <value>mapreduce.shuffle</value> </property> <property> >>> <name>yarn.nodemanager.aux-services</name> >>> <value>myauxserviceclassname</value> >>> </property> >>> >>> Concerning auxiliary services -- do they communicate with >>> NodeManager via RPC? Is there an interface to implement? How are >>> they opened and closed with NodeManager? >>> >>> Thanks >>> John >>> >>> -----Original Message----- >>> From: Harsh J [mailto:ha...@cloudera.com] >>> Sent: Tuesday, June 04, 2013 11:58 PM >>> To: <user@hadoop.apache.org> >>> Subject: Re: yarn-site.xml and aux-services >>> >>> Yes, thats what this is for. You can implement, pass in and use your >>> own AuxService. It needs to be on the NodeManager CLASSPATH to run >>> (and NM has to be restarted to apply). >>> >>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley >>> <john.lil...@redpoint.net> >>> wrote: >>>> I notice the yarn-site.xml >>>> >>>> >>>> >>>> <property> >>>> >>>> <name>yarn.nodemanager.aux-services</name> >>>> >>>> <value>mapreduce.shuffle</value> >>>> >>>> <description>shuffle service that needs to be set for Map >>>> Reduce to run </description> >>>> >>>> </property> >>>> >>>> >>>> >>>> Is this a general-purpose hook? >>>> >>>> Can I tell yarn to run *my* per-node service? >>>> >>>> Is there some other way (within the recommended Hadoop framework) >>>> to run a per-node service that exists during the lifetime of the >>>> NodeManager? >>>> >>>> >>>> >>>> John Lilley >>>> >>>> Chief Architect, RedPoint Global Inc. >>>> >>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302 >>>> >>>> T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077 >>>> >>>> Skype: jlilley.redpoint | john.lil...@redpoint.net | >>>> www.redpoint.net >>>> >>>> >>> >>> >>> >>> -- >>> Harsh J >> >> >> >> -- >> Harsh J >> >> >> >> >> -- >> +Vinod >> Hortonworks Inc. >> http://hortonworks.com/ >> >> >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or >> entity to which it is addressed and may contain information that is >> confidential, privileged and exempt from disclosure under applicable >> law. If the reader of this message is not the intended recipient, you >> are hereby notified that any printing, copying, dissemination, >> distribution, disclosure or forwarding of this communication is >> strictly prohibited. If you have received this communication in >> error, please contact the sender immediately and delete it from your system. >> Thank You. > > > > -- > Harsh J -- Harsh J