[ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610374#comment-15610374 ]
Bibin A Chundatt commented on YARN-5773: ---------------------------------------- IIUC SchedulerApplicationAttempt#isRecovering is set only in following case .App is in ACCEPTED state i am not sure we will get isRecovery=true {code} // We will replay the final attempt only if last attempt is in final // state but application is not in final state. if (rmApp.getCurrentAppAttempt() == appAttempt && !RMAppImpl.isAppInFinalState(rmApp)) { // Add the previous finished attempt to scheduler synchronously so // that scheduler knows the previous attempt. appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent( appAttempt.getAppAttemptId(), false, true)); (new BaseFinalTransition(appAttempt.recoveredFinalState)).transition( appAttempt, event); } {code} {quote} should we update AM diagnostics if we return right from beginning of activateApplications {quote} Cluster resource UI is self explanatory, so required be add ?? {quote} Also, in the patch LOG.debug statements should be guarded with LOG.isDebugEnabled check {quote} Will update the same in next patch > RM recovery too slow due to LeafQueue#activateApplication() > ----------------------------------------------------------- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch, > YARN-5773.003.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is > invoked.Resulting in AM limit check to be done even before Node managers are > getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} > application {{50000000}} iterations causing time take for Rm to be active > more than 10 min. > Since NM resources are not yet added to during recovery we should skip > {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org