https://issues.apache.org/jira/browse/YARN-1151 --john
-----Original Message----- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, September 05, 2013 12:14 PM To: <user@hadoop.apache.org> Subject: Re: yarn-site.xml and aux-services Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/) :) On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <john.lil...@redpoint.net> wrote: > Harsh, > > Thanks as usual for your sage advice. I was hoping to avoid actually > installing anything on individual Hadoop nodes and finessing the service by > spawning it from a task using LocalResources, but this is probably fraught > with trouble. > > FWIW, I would vote to be able to load YARN services from HDFS. What is the > appropriate forum to file a request like that? > > Thanks > John > > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Wednesday, September 04, 2013 12:05 AM > To: <user@hadoop.apache.org> > Subject: Re: yarn-site.xml and aux-services > >> Thanks for the clarification. I would find it very convenient in this case >> to have my custom jars available in HDFS, but I can see the added complexity >> needed for YARN to maintain cache those to local disk. > > We could class-load directly from HDFS, like HBase Co-Processors do. > >> Consider a scenario analogous to the MR shuffle, where the persistent >> service serves up mapper output files to the reducers across the network: > > Isn't this more complex than just running a dedicated service all the time, > and/or implementing a way to spawn/end a dedicated service temporarily? I'd > pick trying to implement such a thing than have my containers implement more > logic. > > On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <john.lil...@redpoint.net> > wrote: >> Harsh, >> >> Thanks for the clarification. I would find it very convenient in this case >> to have my custom jars available in HDFS, but I can see the added complexity >> needed for YARN to maintain cache those to local disk. >> >> What about having the tasks themselves start the per-node service as a child >> process? I've been told that the NM kills the process group, but won't >> setgrp() circumvent that? >> >> Even given that, would the child process of one task have proper environment >> and permission to act on behalf of other tasks? Consider a scenario >> analogous to the MR shuffle, where the persistent service serves up mapper >> output files to the reducers across the network: >> 1) AM spawns "mapper-like" tasks around the cluster >> 2) Each mapper-like task on a given node launches a "persistent service" >> child, but only if one is not already running. >> 3) Each mapper-like task writes one or more output files, and informs the >> service of those files (along with AM-id, Task-id etc). >> 4) AM spawns "reducer-like" tasks around the cluster. >> 5) Each reducer-like task is told which nodes contain "mapper" result data, >> and connects to services on those nodes to read the data. >> >> There are some details missing, like how the lifetime of the temporary files >> is controlled to extend beyond the mapper-like task lifetime but still be >> cleaned up on AM exit, and how the reducer-like tasks are informed of which >> nodes have data. >> >> John >> >> >> -----Original Message----- >> From: Harsh J [mailto:ha...@cloudera.com] >> Sent: Friday, August 23, 2013 11:00 AM >> To: <user@hadoop.apache.org> >> Subject: Re: yarn-site.xml and aux-services >> >> The general practice is to install your deps into a custom location such as >> /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also >> configuring the classes under the aux-services list. You need to take care >> of deploying jar versions to /opt/john-jars/ contents across the cluster >> though. >> >> I think it may be a neat idea to have jars be placed on HDFS or any other >> DFS, and the yarn-site.xml indicating the location plus class to load. >> Similar to HBase co-processors. But I'll defer to Vinod on if this would be >> a good thing to do. >> >> (I know the right next thing with such an ability people will ask for >> is hot-code-upgrades...) >> >> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <john.lil...@redpoint.net> >> wrote: >>> Are there recommended conventions for adding additional code to a >>> stock Hadoop install? >>> >>> It would be nice if we could piggyback on whatever mechanisms are >>> used to distribute hadoop itself around the cluster. >>> >>> john >>> >>> >>> >>> From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] >>> Sent: Thursday, August 22, 2013 6:25 PM >>> >>> >>> To: user@hadoop.apache.org >>> Subject: Re: yarn-site.xml and aux-services >>> >>> >>> >>> >>> >>> Auxiliary services are essentially administer-configured services. >>> So, they have to be set up at install time - before NM is started. >>> >>> >>> >>> +Vinod >>> >>> >>> >>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley >>> <john.lil...@redpoint.net> >>> wrote: >>> >>> Following up on this, how exactly does one *install* the jar(s) for >>> auxiliary service? Can it be shipped out with the LocalResources of an AM? >>> MapReduce's aux-service is presumably installed with Hadoop and is >>> just sitting there in the right place, but if one wanted to make a >>> whole new aux-service that belonged with an AM, how would one do it? >>> >>> John >>> >>> >>> -----Original Message----- >>> From: John Lilley [mailto:john.lil...@redpoint.net] >>> Sent: Wednesday, June 05, 2013 11:41 AM >>> To: user@hadoop.apache.org >>> Subject: RE: yarn-site.xml and aux-services >>> >>> Wow, thanks. Is this documented anywhere other than the code? I >>> hate to waste y'alls time on things that can be RTFMed. >>> John >>> >>> >>> -----Original Message----- >>> From: Harsh J [mailto:ha...@cloudera.com] >>> Sent: Wednesday, June 05, 2013 9:35 AM >>> To: <user@hadoop.apache.org> >>> Subject: Re: yarn-site.xml and aux-services >>> >>> John, >>> >>> The format is ID and sub-config based: >>> >>> First, you define an ID as a service, like the string "foo". This is >>> the ID the applications may lookup in their container responses map >>> we discussed over another thread (around shuffle handler). >>> >>> <property> >>> <name>yarn.nodemanager.aux-services</name> >>> <value>foo</value> >>> </property> >>> >>> Then you define an actual implementation class for that ID "foo", like so: >>> >>> <property> >>> <name>yarn.nodemanager.aux-services.foo.class</name> >>> <value>com.mypack.MyAuxServiceClassForFoo</value> >>> </property> >>> >>> If you have multiple services foo and bar, then it would appear like >>> the below (comma separated IDs and individual configs): >>> >>> <property> >>> <name>yarn.nodemanager.aux-services</name> >>> <value>foo,bar</value> >>> </property> >>> <property> >>> <name>yarn.nodemanager.aux-services.foo.class</name> >>> <value>com.mypack.MyAuxServiceClassForFoo</value> >>> </property> >>> <property> >>> <name>yarn.nodemanager.aux-services.bar.class</name> >>> <value>com.mypack.MyAuxServiceClassForBar</value> >>> </property> >>> >>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley >>> <john.lil...@redpoint.net> >>> wrote: >>>> Good, I was hoping that would be the case. But what are the >>>> mechanics of it? Do I just add another entry? And what exactly is >>>> "madreduce.shuffle"? >>>> A scoped class name? Or a key string into some map elsewhere? >>>> >>>> e.g. like: >>>> >>>> <property> >>>> <name>yarn.nodemanager.aux-services</name> >>>> <value>mapreduce.shuffle</value> </property> <property> >>>> <name>yarn.nodemanager.aux-services</name> >>>> <value>myauxserviceclassname</value> >>>> </property> >>>> >>>> Concerning auxiliary services -- do they communicate with >>>> NodeManager via RPC? Is there an interface to implement? How are >>>> they opened and closed with NodeManager? >>>> >>>> Thanks >>>> John >>>> >>>> -----Original Message----- >>>> From: Harsh J [mailto:ha...@cloudera.com] >>>> Sent: Tuesday, June 04, 2013 11:58 PM >>>> To: <user@hadoop.apache.org> >>>> Subject: Re: yarn-site.xml and aux-services >>>> >>>> Yes, thats what this is for. You can implement, pass in and use >>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to >>>> run (and NM has to be restarted to apply). >>>> >>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley >>>> <john.lil...@redpoint.net> >>>> wrote: >>>>> I notice the yarn-site.xml >>>>> >>>>> >>>>> >>>>> <property> >>>>> >>>>> <name>yarn.nodemanager.aux-services</name> >>>>> >>>>> <value>mapreduce.shuffle</value> >>>>> >>>>> <description>shuffle service that needs to be set for Map >>>>> Reduce to run </description> >>>>> >>>>> </property> >>>>> >>>>> >>>>> >>>>> Is this a general-purpose hook? >>>>> >>>>> Can I tell yarn to run *my* per-node service? >>>>> >>>>> Is there some other way (within the recommended Hadoop framework) >>>>> to run a per-node service that exists during the lifetime of the >>>>> NodeManager? >>>>> >>>>> >>>>> >>>>> John Lilley >>>>> >>>>> Chief Architect, RedPoint Global Inc. >>>>> >>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302 >>>>> >>>>> T: +1 303 541 1516 | M: +1 720 938 5761 | F: +1 781-705-2077 >>>>> >>>>> Skype: jlilley.redpoint | john.lil...@redpoint.net | >>>>> www.redpoint.net >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Harsh J >>> >>> >>> >>> -- >>> Harsh J >>> >>> >>> >>> >>> -- >>> +Vinod >>> Hortonworks Inc. >>> http://hortonworks.com/ >>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or >>> entity to which it is addressed and may contain information that is >>> confidential, privileged and exempt from disclosure under applicable >>> law. If the reader of this message is not the intended recipient, >>> you are hereby notified that any printing, copying, dissemination, >>> distribution, disclosure or forwarding of this communication is >>> strictly prohibited. If you have received this communication in >>> error, please contact the sender immediately and delete it from your >>> system. Thank You. >> >> >> >> -- >> Harsh J > > > > -- > Harsh J -- Harsh J