Thanks Ravi. Well, in this case its a no-effort :) A failure of AM init should be considered as failure of the job? I looked at the code and best-effort makes sense with respect to retry logic etc. You make a good point that there would be no notification in case AM OOMs, but I do feel AM init failure should send a notification by other means.
On Sat, Jun 22, 2013 at 2:38 PM, Ravi Prakash <ravi...@ymail.com> wrote: > Hi Prashant, > > I would tend to agree with you. Although job-end notification is only a > "best-effort" mechanism (i.e. we cannot always guarantee notification for > example when the AM OOMs), I agree with you that we can do more. If you > feel strongly about this, please create a JIRA and possibly upload a patch. > > Thanks > Ravi > > > ------------------------------ > *From:* Prashant Kommireddi <prash1...@gmail.com> > *To:* "user@hadoop.apache.org" <user@hadoop.apache.org> > *Sent:* Thursday, June 20, 2013 9:45 PM > *Subject:* Job end notification does not always work (Hadoop 2.x) > > Hello, > > I came across an issue that occurs with the job notification callbacks in > MR2. It works fine if the Application master has started, but does not send > a callback if the initializing of AM fails. > > Here is the code from MRAppMaster.java > > ..... > ....... > > // set job classloader if configured > MRApps.setJobClassLoader(conf); > initAndStartAppMaster(appMaster, conf, jobUserName); > } catch (Throwable t) { > LOG.fatal("Error starting MRAppMaster", t); > System.exit(1); > } > } > > protected static void initAndStartAppMaster(final MRAppMaster appMaster, > final YarnConfiguration conf, String jobUserName) throws IOException, > InterruptedException { > UserGroupInformation.setConfiguration(conf); > UserGroupInformation appMasterUgi = UserGroupInformation > .createRemoteUser(jobUserName); > appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() { > @Override > public Object run() throws Exception { > appMaster.init(conf); > appMaster.start(); > if(appMaster.errorHappenedShutDown) { > throw new IOException("Was asked to shut down."); > } > return null; > } > }); > } > > appMaster.init(conf) does not dispatch JobFinishEventHandler which is > responsible for sending a HTTP callback (via shutDownJob()). If there was > an exception at this time, the process would simply terminate (via > System.exit(1) ) > > appMaster.start() however rightly uses the JobFinishEventHandler and > things work fine. > > Shouldn't a failure on init(..) also send a callback suggesting the job > failed? > > Thanks, > Prashant > > > >