Hi Prashant, I would tend to agree with you. Although job-end notification is only a "best-effort" mechanism (i.e. we cannot always guarantee notification for example when the AM OOMs), I agree with you that we can do more. If you feel strongly about this, please create a JIRA and possibly upload a patch.
Thanks Ravi ________________________________ From: Prashant Kommireddi <prash1...@gmail.com> To: "user@hadoop.apache.org" <user@hadoop.apache.org> Sent: Thursday, June 20, 2013 9:45 PM Subject: Job end notification does not always work (Hadoop 2.x) Hello, I came across an issue that occurs with the job notification callbacks in MR2. It works fine if the Application master has started, but does not send a callback if the initializing of AM fails. Here is the code from MRAppMaster.java ..... ....... // set job classloader if configured MRApps.setJobClassLoader(conf); initAndStartAppMaster(appMaster, conf, jobUserName); } catch (Throwable t) { LOG.fatal("Error starting MRAppMaster", t); System.exit(1); } } protected static void initAndStartAppMaster(final MRAppMaster appMaster, final YarnConfiguration conf, String jobUserName) throws IOException, InterruptedException { UserGroupInformation.setConfiguration(conf); UserGroupInformation appMasterUgi = UserGroupInformation .createRemoteUser(jobUserName); appMasterUgi.doAs(new PrivilegedExceptionAction<Object>() { @Override public Object run() throws Exception { appMaster.init(conf); appMaster.start(); if(appMaster.errorHappenedShutDown) { throw new IOException("Was asked to shut down."); } return null; } }); } appMaster.init(conf) does not dispatch JobFinishEventHandler which is responsible for sending a HTTP callback (via shutDownJob()). If there was an exception at this time, the process would simply terminate (via System.exit(1) ) appMaster.start() however rightly uses the JobFinishEventHandler and things work fine. Shouldn't a failure on init(..) also send a callback suggesting the job failed? Thanks, Prashant