[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133886#comment-16133886 ] Yuqi Wang commented on YARN-6959: - [~jianhe] Great! Thank you so much! > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, > YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, > YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16133224#comment-16133224 ] Yuqi Wang commented on YARN-6959: - [~jianhe] Seems the UT failures are not caused by my patch, please check. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, > YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, > YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16132162#comment-16132162 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.8 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 58s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 11s{color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} branch-2.8 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-2.8 passed with JDK v1.7.0_151 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 18s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 299 unchanged - 0 fixed = 300 total (was 299) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 5s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_151. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}182m 16s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_151 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerLazyPreemption | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:d946387 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12882550/YARN-6959-branch-2.8.002.patch | | Opt
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130969#comment-16130969 ] Jian He commented on YARN-6959: --- [~yqwang], Thanks for the patch, I've committed the branch-2.7 patch. Could you upload a patch for branch-2.8 too ? branch-2.8 also have some conflicts > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, > YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, > YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129744#comment-16129744 ] Yuqi Wang commented on YARN-6959: - [~jianhe] Seems the UT failures are not caused by my patch, please check. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, > YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, > YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129040#comment-16129040 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 6s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 55s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 7 new + 1298 unchanged - 3 fixed = 1305 total (was 1301) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}129m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:67e87c9 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12882141/YARN-6959-branch-2.7.006.patch | | Optional Tests | asfl
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128845#comment-16128845 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 50s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 33s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 7 new + 1298 unchanged - 3 fixed = 1305 total (was 1301) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 6s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}131m 25s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:67e87c9 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12882122/YARN-6959-branch-2.7.005.patch | | Optional Tests | asfli
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128542#comment-16128542 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 47s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 37s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 48s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 34s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 7 new + 1298 unchanged - 3 fixed = 1305 total (was 1301) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 17s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.s
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128402#comment-16128402 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 12m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 23s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 31s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 7 new + 1299 unchanged - 3 fixed = 1306 total (was 1302) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 6s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}134m 6s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:67e87c9 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/att
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128348#comment-16128348 ] Yuqi Wang commented on YARN-6959: - [~jianhe] I updated the new patch for 2.7, do you know how to trigger jenkins against branch-2.7? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, > YARN-6959-branch-2.7.003.patch, YARN-6959-branch-2.8.001.patch, > YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127696#comment-16127696 ] Jian He commented on YARN-6959: --- [~yqwang], TestFairScheduler is failing with the patch , can you take a look ? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, > YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127436#comment-16127436 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 11m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 56s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} branch-2.7 passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} branch-2.7 passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 7 new + 1298 unchanged - 3 fixed = 1305 total (was 1301) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 57s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}129m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_131 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:67e87c9 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/att
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126183#comment-16126183 ] Hudson commented on YARN-6959: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #12178 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/12178/]) YARN-6959. RM may allocate wrong AM Container for new attempt. (jianhe: rev e2f6299f6f580d7a03f2377d19ac85f55fd4e73b) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126093#comment-16126093 ] Jian He commented on YARN-6959: --- [~yqwang], yep, I've committed to trunk and branch-2, branch-2.8 doesn't apply, could you provide a patch for branch-2.8 ? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122967#comment-16122967 ] Yuqi Wang commented on YARN-6959: - [~jianhe] Is this patch ready to accept? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121202#comment-16121202 ] Yuqi Wang commented on YARN-6959: - I already add a comment on it: // TODO: Rename it to getCurrentApplicationAttempt I think it is clear. What do you think about it? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121198#comment-16121198 ] Jian He commented on YARN-6959: --- bq. And if I just change getApplicationAttempt to getCurrentApplicationAttempt, it is more likely to hide the bugs. Don't get you, it's just a rename refactor? how will it add/hide bugs? Anyway, looks like a bunch of callers, better not do, as this will affect other activities going on. Would you mind adding a comment on the getApplicationAttempt method to explain its behavior ? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121196#comment-16121196 ] Yuqi Wang commented on YARN-6959: - The renaming can be made in next hadoop version. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121193#comment-16121193 ] Yuqi Wang commented on YARN-6959: - As this issue, other places which call getApplicationAttempt may also want to get the attempt specified in the arg instead of current attempt. And if I just change getApplicationAttempt to getCurrentApplicationAttempt, it is more likely to hide the bugs. I think only for this fix, I will not touch getApplicationAttempt until we have a confirmed all places used getApplicationAttempt is safe. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121170#comment-16121170 ] Jian He commented on YARN-6959: --- bq. we should better rename it to getCurrentApplicationAttemp Yep, would you like to rename it in this patch ? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121146#comment-16121146 ] Yuqi Wang commented on YARN-6959: - Yes, it is very rare. It is the first time I have seen in our large cluster. The log was from our production cluster. We have very larger cluster (>50k nodes) which serves daily batch jobs and long running services from our customer in Microsoft. Our customer complains that their job just fail without any effective retry/attempts. Because as the log showed, the AM container size decreased from 20GB to 5GB, so the new attempt will be definitively fail since pmem limitation is enabled. As I said in this JIRA Description: Concerns: The getApplicationAttempt function in AbstractYarnScheduler is so confusing, we should better rename it to getCurrentApplicationAttempt. And reconsider whether there are any other bugs related to getApplicationAttempt. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121119#comment-16121119 ] Jian He commented on YARN-6959: --- Yes, I agree It is possible, but may happen rarely as NM and RM also has the heartbeat interval. The fix is fine, just wondering if there are other issues behind this, otherwise, the fix will just hide other issues, if any. Btw, did this happen in a real cluster ? > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121095#comment-16121095 ] Yuqi Wang commented on YARN-6959: - I meant heartbeats from Step0 is blocked between MARK1 and MARK3 (i.e. blocked until Step3. RM switched to the new attempt.). So, it may be blocked in MARK2, or may be blocked in some other places between MARK1 and MARK3. And the RPC time before MARK1 cannot be ignored, and it can run parallel with the process (AM container completes -> NM reports to RM -> RM process a series of events). I have not figure out which account for the largest time yet. However, anyway, there is a race condition. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121079#comment-16121079 ] Jian He commented on YARN-6959: --- Do you mean step0 is blocked on MARK2 until the this entire process(AM container completes -> NM reports to RM -> RM process a series of events -> and finally a new Attempt gets added in scheduler) is completed? Question is why is step0 be blocked for so long ? there's no contention to grab the lock if I understand correctly. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121065#comment-16121065 ] Yuqi Wang commented on YARN-6959: - Basically, I meant that the allocate RPC call which is sent before AM process exited, caused this issue. [~jianhe], could you please reconsider it. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121060#comment-16121060 ] Yuqi Wang commented on YARN-6959: - [~jianhe] The whole pipeline was that: Step0. AM sent heartbeats to RM. Step1. AM process crashed with exitcode 15 without unregister to RM. Step2-a. NM told RM the AM container has completed. Step2-b. The heartbeats sent in step0, was processing by RM between MARK1 and MARK3. Step3. RM switched to the new attempt. Step4. The heartbeats record request from previous AM into current attempt. So, it is possible. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121050#comment-16121050 ] Jian He commented on YARN-6959: --- ok, the first AM container process exited then, it's impossible for it to call allocate again. I guess the root cause is different. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, > YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121030#comment-16121030 ] Yuqi Wang commented on YARN-6959: - {code:java} 2017-07-31 21:29:34,047 INFO [Container Monitor] org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree container_e71_1500967702061_2512_01_01 for container-id container_e71_1500967702061_2512_01_01: 7.1 GB of 20 GB physical memory used; 8.5 GB of 30 GB virtual memory used 2017-07-31 21:29:37,423 INFO [Container Monitor] org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree container_e71_1500967702061_2512_01_01 for container-id container_e71_1500967702061_2512_01_01: 7.1 GB of 20 GB physical memory used; 8.5 GB of 30 GB virtual memory used 2017-07-31 21:29:38,239 WARN [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit code from container container_e71_1500967702061_2512_01_01 is : 15 2017-07-31 21:29:38,239 WARN [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_e71_1500967702061_2512_01_01 and exit code: 15 ExitCodeException exitCode=15: at org.apache.hadoop.util.Shell.runCommand(Shell.java:579) at org.apache.hadoop.util.Shell.run(Shell.java:490) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:756) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:329) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:86) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2017-07-31 21:29:38,239 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch. Container id: container_e71_1500967702061_2512_01_01 Exit code: 15 Stack trace: ExitCodeException exitCode=15: 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.runCommand(Shell.java:579) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell.run(Shell.java:490) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:756) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:329) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:86) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.FutureTask.run(FutureTask.java:266) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at java.lang.Thread.run(Thread.java:745) 2017-07-31 21:29:38,240 INFO [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: 2017-07-31 21:29:38,241 WARN [ContainersLauncher #60] org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 15 2017-07-31 21:29:38,241 INFO [AsyncDispatcher event handler] org.apache.hado
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120793#comment-16120793 ] Jian He commented on YARN-6959: --- It's still unclear to me. For MARK2, once the lock is released, it can just proceed. {code} synchronized (lock) { // MARK2: The RPC call may be blocked here for a long time ... // MARK3: During MARK1 and here, RM may switch to the new attempt. So, previous // attempt ResourceRequest may be recorded into current attempt ResourceRequests scheduler.allocate(attemptId, ask, ...) -> scheduler.getApplicationAttempt(attemptId) ... } {code} >From the log, I do see that the AM container size changed. Also, I see that >the first AM container completed at {code} 2017-07-31 21:29:38,338 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e71_1500967702061_2512_01_01 Container Transitioned from RUNNING to COMPLETED {code} if the AM container process had already exited, how is it possible to call allocate again. Can you check on NodeManager Log that if the first AM container indeed completed? Are you able to enable debug level log and reproduce this issue ? or reproduce the issue with a UT. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119366#comment-16119366 ] Yuqi Wang commented on YARN-6959: - Anyway, as YARN-5197, executing a double check to avoid potential race condition, network issues, etc should be a best practice. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_rm.log.zip > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119334#comment-16119334 ] Yuqi Wang commented on YARN-6959: - [~jianhe] Reproduce the race condition during below segment pipeline of one AM RM RPC call: {code:java} // One AM RM RPC call ApplicationMasterService.allocate() { AllocateResponseLock lock = responseMap.get(appAttemptId); if (lock == null) { // MARK1: At this time, the appAttemptId is still current attempt, so the RPC call continues. ... throw new ApplicationAttemptNotFoundException(); } synchronized (lock) { // MARK2: The RPC call may be blocked here for a long time ... // MARK3: During MARK1 and here, RM may switch to the new attempt. So, previous // attempt ResourceRequest may be recorded into current attempt ResourceRequests scheduler.allocate(attemptId, ask, ...) -> scheduler.getApplicationAttempt(attemptId) ... } } {code} I saw the log you mentioned. It shows that, RM switched to the new attempt and afterwards there was still some allocate() from previous attempt came into the scheduler. For details, I just attached the full log in the attachment, please check. {code:java} 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e71_1500967702061_2512_01_000361 Container Transitioned from RUNNING to COMPLETED 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: Completed container: container_e71_1500967702061_2512_01_000361 in state: COMPLETED event:FINISHED 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1500967702061_2512 CONTAINERID=container_e71_1500967702061_2512_01_000361 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: prod-new used= numContainers=9349 user=hadoop user-resources= 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: completedContainer container=Container: [ContainerId: container_e71_1500967702061_2512_01_000361, NodeId: BN1APS0A410B91:10025, NodeHttpAddress: Proxy5.Yarn-Prod-Bn2.BN2.ap.gbl:81/proxy/nodemanager/BN1APS0A410B91/8042, Resource: , Priority: 1, Token: Token { kind: ContainerToken, service: 10.65.11.145:10025 }, ] queue=prod-new: capacity=0.7, absoluteCapacity=0.7, usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=6, numContainers=9349 cluster= 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: completedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used= cluster= 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: Re-sorting completed queue: root.prod-new stats: prod-new: capacity=0.7, absoluteCapacity=0.7, usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=6, numContainers=9349 2017-07-31 21:29:38,351 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application attempt appattempt_1500967702061_2512_01 released container container_e71_1500967702061_2512_01_000361 on node: host: BN1APS0A410B91:10025 #containers=3 available= used= with event: FINISHED 2017-07-31 21:29:38,353 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1500967702061_2512_01 2017-07-31 21:29:38,353 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished, removing password for appattempt_1500967702061_2512_01 2017-07-31 21:29:38,353 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500967702061_2512_01 State change from FINAL_SAVING to FAILED 2017-07-31 21:29:38,353 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of failed attempts is 1. The max attempts is 3 2017-07-31 21:29:38,354 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1500967702061_2512 State change from RUNNING to ACCEPTED 2017-07-31 21:29:38,354 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application A
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118836#comment-16118836 ] Jian He commented on YARN-6959: --- [~yqwang], thanks for the patch. one question is I'm wondering under what scenario this can happen. For each failed attempt, we remove it from the ApplicationMasterService#responseMap, and in ApplicationMasterService#allocate we will check is the attempt is in the responseMap, if not, that will block allocate into scheduler. Do you see this log line in ApplicationMasterService for the 1st attempt ? {{LOG.info("Unregistering app attempt : " + attemptId);}} > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118030#comment-16118030 ] Yuqi Wang commented on YARN-6959: - [~templedf] [~jianhe] Please help to review this. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >Assignee: Yuqi Wang > Labels: patch > Fix For: 2.7.1, 3.0.0-alpha4 > > Attachments: YARN-6959.001.patch, YARN-6959.002.patch, > YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, > YARN-6959-branch-2.7.001.patch > > > *Issue Summary:* > Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests. These mis-recorded ResourceRequests may confuse AM > Container Request and Allocation for current attempt. > *Issue Pipeline:* > {code:java} > // Executing precondition check for the incoming attempt id. > ApplicationMasterService.allocate() -> > scheduler.allocate(attemptId, ask, ...) -> > // Previous precondition check for the attempt id may be outdated here, > // i.e. the currentAttempt may not be the corresponding attempt of the > attemptId. > // Such as the attempt id is corresponding to the previous attempt. > currentAttempt = scheduler.getApplicationAttempt(attemptId) -> > // Previous attempt ResourceRequest may be recorded into current attempt > ResourceRequests > currentAttempt.updateResourceRequests(ask) -> > // RM may allocate wrong AM Container for the current attempt, because its > ResourceRequests > // may come from previous attempt which can be any ResourceRequests previous > AM asked > // and there is not matching logic for the original AM Container > ResourceRequest and > // the returned amContainerAllocation below. > AMContainerAllocatedTransition.transition(...) -> > amContainerAllocation = scheduler.allocate(currentAttemptId, ...) > {code} > *Patch Correctness:* > Because after this Patch, RM will definitely record ResourceRequests from > different attempt into different objects of > SchedulerApplicationAttempt.AppSchedulingInfo. > So, even if RM still record ResourceRequests from old attempt at any time, > these ResourceRequests will be recorded in old AppSchedulingInfo object which > will not impact current attempt's resource requests and allocation. > *Concerns:* > The getApplicationAttempt function in AbstractYarnScheduler is so confusing, > we should better rename it to getCurrentApplicationAttempt. And reconsider > whether there are any other bugs related to getApplicationAttempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118022#comment-16118022 ] Hadoop QA commented on YARN-6959: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 390 unchanged - 4 fixed = 390 total (was 394) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 43m 33s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880793/YARN-6959.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux d4cc7d8b7fe4 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 55a181f | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16768/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16768/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/browse/YARN-6959 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, fairscheduler, scheduler >Affects Versions: 2.7.1 >Reporter: Yuqi Wang >
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117945#comment-16117945 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 391 unchanged - 4 fixed = 391 total (was 395) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 44m 30s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.metrics.TestSystemMetricsPublisher | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880790/YARN-6959.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 53b9e67b898a 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 55a181f | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16767/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16767/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16767/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RM may allocate wrong
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117829#comment-16117829 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 0 new + 390 unchanged - 4 fixed = 390 total (was 394) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 45m 52s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 23s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | | | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880777/YARN-6959.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 916222ef4bd3 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d3fd81 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16766/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16766/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16766/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RM may allocate wrong AM Container f
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117796#comment-16117796 ] Hadoop QA commented on YARN-6959: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 393 unchanged - 1 fixed = 394 total (was 394) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 44m 28s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 19s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880771/YARN-6959.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 493a53b5cebf 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8d3fd81 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16765/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16765/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/16765/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RM may allocate wrong AM Container for new attempt > -- > > Key: YARN-6959 > URL: https://issues.apache.org/jira/bro
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116733#comment-16116733 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 28s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 396 unchanged - 1 fixed = 397 total (was 397) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 27s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 14s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 79m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestLeaderElectorService | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880645/YARN-6959.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux f0163abaeeff 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0b67436 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16740/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16740/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16740/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116390#comment-16116390 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 32s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 2 new + 203 unchanged - 0 fixed = 205 total (was 203) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 39s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 69m 27s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | YARN-6959 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12880611/YARN-6959.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux fbb953134c75 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46b7054 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/16734/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/16734/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/16734/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-res
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116372#comment-16116372 ] Hadoop QA commented on YARN-6959: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 12s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed with JDK v1.8.0_144 {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 13s{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed with JDK v1.8.0_144 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in trunk failed with JDK v1.7.0_131. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 10s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 10s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 10s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 12s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 9s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed with JDK v1.8.0_144 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 11s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_131. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 35s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_144 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | JDK v1.8.0_144 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.TestRMStoreCommands | | | org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore | | | org.apache.hadoop.yarn.server.resourcemanager.TestSubmit
[jira] [Commented] (YARN-6959) RM may allocate wrong AM Container for new attempt
[ https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116220#comment-16116220 ] Yuqi Wang commented on YARN-6959: - Here is the log for the issue: application_1500967702061_2512 asked for 20GB for AM Container and 5GB for its Task Container: {code:java} 2017-07-31 20:58:49,532 INFO [Container Monitor] org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree container_e71_1500967702061_2512_01_01 for container-id container_e71_1500967702061_2512_01_01: 307.8 MB of 20 GB physical memory used; 1.2 GB of 30 GB virtual memory used {code} After its first attempt failed, the second attempt was submitted; however, NM mistakenly believed the AM Container was 5GB: {code:java} 2017-07-31 21:29:46,219 INFO [Container Monitor] org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Memory usage of ProcessTree container_e71_1500967702061_2512_02_01 for container-id container_e71_1500967702061_2512_02_01: 352.5 MB of 5 GB physical memory used; 1.4 GB of 7.5 GB virtual memory used {code} Here is the RM log, which also has the InvalidStateTransitonException: Invalid event: CONTAINER_ALLOCATED at ALLOCATED_SAVING: {code:java} 2017-07-31 21:29:38,510 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Application added - appId: application_1500967702061_2512 user: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue$User@57fbb4f5, leaf-queue: prod-new #user-pending-applications: 0 #user-active-applications: 6 #queue-pending-applications: 0 #queue-active-applications: 6 2017-07-31 21:29:38,510 INFO [ResourceManager Event Processor] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Added Application Attempt appattempt_1500967702061_2512_02 to scheduler from user hadoop in queue prod-new 2017-07-31 21:29:38,514 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1500967702061_2512_02 State change from SUBMITTED to SCHEDULED 2017-07-31 21:29:38,517 INFO [Thread-13] org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e71_1500967702061_2512_02_01 Container Transitioned from NEW to ALLOCATED 2017-07-31 21:29:38,517 INFO [Thread-13] org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hadoop OPERATION=AM Allocated ContainerTARGET=SchedulerApp RESULT=SUCCESS APPID=application_1500967702061_2512 CONTAINERID=container_e71_1500967702061_2512_02_01 2017-07-31 21:29:38,517 INFO [Thread-13] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: assignedContainer application attempt=appattempt_1500967702061_2512_02 container=Container: [ContainerId: container_e71_1500967702061_2512_02_01, NodeId: BN2APS0A98AEA0:10025, NodeHttpAddress: Proxy5.Yarn-Prod-Bn2.BN2.ap.gbl:81/proxy/nodemanager/BN2APS0A98AEA0/8042, Resource: , Priority: 1, Token: null, ] queue=prod-new: capacity=0.7, absoluteCapacity=0.7, usedResources=, usedCapacity=0.0, absoluteUsedCapacity=0.0, numApps=6, numContainers=8016 clusterResource= type=OFF_SWITCH 2017-07-31 21:29:38,517 INFO [Thread-13] org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.0 absoluteUsedCapacity=0.0 used= cluster= 2017-07-31 21:29:38,517 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : BN2APS0A98AEA0:10025 for container : container_e71_1500967702061_2512_02_01 2017-07-31 21:29:38,517 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_e71_1500967702061_2512_02_01 Container Transitioned from ALLOCATED to ACQUIRED 2017-07-31 21:29:38,517 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1500967702061_2512_02 2017-07-31 21:29:38,517 LOP-998291496]-[download]-[0@1]-[application_1501027078051_3009],prod-new,null,null,-1," for attrs weka.core.FastVector@789038c6 2017-07-31 21:29:38,517 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1500967702061_2512 AttemptId: appattempt_1500967702061_2512_02 MasterContainer: Container: [ContainerId: container_e71_1500967702061_2512_02_01, NodeId: BN2APS0A98AEA0:10025, NodeHttpAddress: Proxy5.Yarn-Prod-Bn2.BN2.ap.gbl:81/proxy/nodemanager/BN2APS0A98AEA0/8042, Resource: , Priority: 1, Token: Token { kind: ContainerToken, service: