[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121202#comment-16121202 ]
Yuqi Wang edited comment on YARN-6959 at 8/10/17 7:44 AM: ---------------------------------------------------------- I already added a comment on it in the patch: // TODO: Rename it to getCurrentApplicationAttempt I think it is clear. What do you think about it? was (Author: yqwang): I already add a comment on it: // TODO: Rename it to getCurrentApplicationAttempt I think it is clear. What do you think about it? > RM may allocate wrong AM Container for new attempt > -------------------------------------------------- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler > Affects Versions: 2.7.1 > Reporter: Yuqi Wang > Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org