[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897673#comment-15897673 ] Sunil G commented on YARN-6207: --- +1 committing shortly. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch, YARN-6207.005.patch, > YARN-6207.006.patch, YARN-6207.007.patch, YARN-6207.008.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895773#comment-15895773 ] Bibin A Chundatt commented on YARN-6207: Test failure is not related to patch attached . > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch, YARN-6207.005.patch, > YARN-6207.006.patch, YARN-6207.007.patch, YARN-6207.008.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895528#comment-15895528 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 51s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 46s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12856036/YARN-6207.008.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 453514891c44 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 0f336ba | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15172/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15172/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15172/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/bro
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894081#comment-15894081 ] Sunil G commented on YARN-6207: --- I kicked jenkins once again to see test cases since I saw a random local failure. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch, YARN-6207.005.patch, > YARN-6207.006.patch, YARN-6207.007.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894078#comment-15894078 ] Rohith Sharma K S commented on YARN-6207: - +1 LGTM.. [~bibinchundatt] would you mind checking checkstyle errors? > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch, YARN-6207.005.patch, > YARN-6207.006.patch, YARN-6207.007.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891635#comment-15891635 ] Hadoop QA commented on YARN-6207: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 10 new + 313 unchanged - 0 fixed = 323 total (was 313) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 39m 33s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 36s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12855432/YARN-6207.007.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 789ce9691fa7 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6f6dfe0 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15131/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15131/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15131/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > >
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890888#comment-15890888 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 11s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 40s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 15 new + 313 unchanged - 0 fixed = 328 total (was 313) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 22s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 51s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 96m 11s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer | | | hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector | | Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStorePerf | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12855422/YARN-6207.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux bb3ba1939ecc 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 82ef9ac | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/15122/artifact/patchprocess/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15122/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results |
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890802#comment-15890802 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 8m 49s{color} | {color:red} Docker failed to build yetus/hadoop:a9ad5d6. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12855432/YARN-6207.007.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15123/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch, YARN-6207.005.patch, > YARN-6207.006.patch, YARN-6207.007.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885509#comment-15885509 ] Rohith Sharma K S commented on YARN-6207: - Thanks Bibin for the patch, overall patch looks good to me. Few comments on test # Can you move these test cases to TestCapacityScheduler? Here, you can mock RMApp and directly trigger scheduler event. You can also control order of events like AppAdded->moveApplication->AppAttemptAdded and AppAdded->AppAttemptAdded->AppAttemptRemoved->moveApplication->AppAttemptAdded. Refer {{TestCapacityScheduler}}. Scheduler module remain same and can be test which ever the event order to be tested. # I personally feel NOT to use countdown latch to verify event order. The test code is bit confusing also since latch is created multiple times and using old references for countDown in some cases. Latch initialization is done at setup with 0 and using different references for countdown in test code. # Below test code hangs randomly. Countdown latch should be created before triggering attempt removal event. The test code hang because if event app-attempt-added is triggered before latch creating, then *eventAdd.await();* will never set to 0 which hangs forever. {code} // remove attempt app1Attempt.handle(new RMAppAttemptEvent(app1Attempt.getAppAttemptId(), RMAppAttemptEventType.FAIL, "Launch Fail")); eventAdd = new CountDownLatch(1); eventDone = new CountDownLatch(1); eventAdd.await(); {code} # Add a test timeout. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch, YARN-6207.005.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882347#comment-15882347 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 39m 26s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 39s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12854409/YARN-6207.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 671f58b5720d 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 132f758 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15076/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15076/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15076/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > >
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882210#comment-15882210 ] Bibin A Chundatt commented on YARN-6207: {quote} Incase if the comment is regarding Fair Scheduler currently we will handle only Capacity scheduler cases in this jira {quote} I had already mentioned will handle only CS .We have marked component also as Capacity scheduler. Fair Scheduler probably we handle in another jira > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882114#comment-15882114 ] Sunil G commented on YARN-6207: --- On an other note, many of these cases are also applicable for FairScheduler too. Is it planned in same jira or a new one? > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881940#comment-15881940 ] Sunil G commented on YARN-6207: --- Yes.. There were many corner cases we have discussed in this Jira. I feel we should have more test cases here to ensure all cases are handled well. Thanks for considering same. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881935#comment-15881935 ] Bibin A Chundatt commented on YARN-6207: Yes [~sunilg] i meant the same . > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881854#comment-15881854 ] Sunil G commented on YARN-6207: --- Thanks [~bibinchundatt] for the clarification. I suppose you were mentioning something like below. Seems fine for me. {code} if (!app.isStopped()) { source.finishApplicationAttempt(app, sourceQueueName); // Submit to a new queue dest.submitApplicationAttempt(app, user); } // Finish app & update metrics app.move(dest); } source.appFinished(); source.getParent().finishApplication(appId, user); application.setQueue(dest); {code} > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880193#comment-15880193 ] Bibin A Chundatt commented on YARN-6207: [~sunilg] Thank you for confirming the same.. {code} source.getParent().finishApplication(appId, user); {code} Should i outside {{app != null}} loop.. Consider scenarios of {{Application submit -> Attempt not added -> Move -> attempt added.}} Application count in parent queue will be wrong . Explaining in detail. # Submit application to leaf . Parent queue ++numApplications will increase by 1 . # Move to queue 1 . Parent queue ++numApplications will increase by 1 . # Application completed. Number of application in parent will be still 1. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880157#comment-15880157 ] Sunil G commented on YARN-6207: --- Thanks [~bibinchundatt] for pointing out {{keepContainers}} option. As in earlier patch, then u can keep out {{app.move}} from {{app.isStopped}} check provided {{appSchedulingInfo.move}} is protected. Seems fine for me for this point. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880094#comment-15880094 ] Sunil G commented on YARN-6207: --- bq.Appadded --> attempt added -> 2 * live containers. -> attempt remove with transfer -> move application. -> add attempt There are 2 scenarios rt # AppAdded -> AttemptAdded -> Some Live Containers -> Attempt fail -> doneApplicationAttempt -> Attempt Stopped -> move application. -> add attempt # AppAdded -> AttemptAdded -> Some Live Containers -> move application. -> add attempt App will be STOPPED in case of 1) and App will be running in case of 2). I was mentioning case when app was STOPPED which is scenario 1. In that case, we have called *doneApplicationAttempt*. {code} // Release all reserved containers for (RMContainer rmContainer : attempt.getLiveContainers()) { super.completedContainer(rmContainer, SchedulerUtils .createAbnormalContainerStatus(rmContainer.getContainerId(), "Application Complete"), RMContainerEventType.KILL); } ... // Release all reserved containers for (RMContainer rmContainer : attempt.getReservedContainers()) { super.completedContainer(rmContainer, SchedulerUtils .createAbnormalContainerStatus(rmContainer.getContainerId(), "Application Complete"), RMContainerEventType.KILL); } {code} So all containers are completed here. Since this is a part of completing an attempt, we must remove all metrics here rt. I am doubting its a different issue altogether or not. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880055#comment-15880055 ] Bibin A Chundatt commented on YARN-6207: [~sunilg] {quote} app.move need to be inside !app.isStopped() check. Because if app is stopped, we ensure that all running and reserved containers are invoked with completedContainer call. {quote} This could cause for {{liveContainers}} queue metrics not to be released and added to new. {code} allocatedContainers.decr(containers); aggregateContainersReleased.incr(containers); allocatedMB.decr(res.getMemorySize() * containers); allocatedVCores.decr(res.getVirtualCores() * containers); {code} Metrics could go wrong for the following scenarios. Appadded --> attempt added -> 2 * live containers. -> attempt remove with transfer -> move application. -> add attempt > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879964#comment-15879964 ] Sunil G commented on YARN-6207: --- [~bibinchundatt], thanks for the patch. Had an offline discussion with [~rohithsharma], we were expecting something like below. {code} FiCaSchedulerApp app = application.getCurrentAppAttempt(); if (app != null) { // Move all live containers even when stopped. // For transferStateFromPreviousAttempt required for (RMContainer rmContainer : app.getLiveContainers()) { source.detachContainer(getClusterResource(), app, rmContainer); // attach the Container to another queue dest.attachContainer(getClusterResource(), app, rmContainer); } if (!app.isStopped()) { source.finishApplicationAttempt(app, sourceQueueName); // Submit to a new queue dest.submitApplicationAttempt(app, user); // Finish app & update metrics app.move(dest); } source.appFinished(); source.getParent().finishApplication(appId, user); } application.setQueue(dest); LOG.info("App: " + appId + " successfully moved from " + sourceQueueName + " to: " + destQueueName); return targetQueueName; {code} Reasons behind this proposal. # {{source.finishApplication(appId, user);}} is not needed as {{AppSchedulingInfo.move}} is updating {{abstractUsersManager.deactivateApplication(user, applicationId);}}. So we jus need to invoke appFinished and inform parent. Hence those two lines are added. # {{app.move}} need to be inside {{!app.isStopped()}} check. Because if app is stopped, we ensure that all running and reserved containers are invoked with completedContainer call. Apart from this, {{app != null}} check need not have to throw exception. Any way app is done, so do we need to bomb to client? > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch, YARN-6207.004.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additi
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878955#comment-15878955 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 21s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12854003/YARN-6207.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux af756436b909 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4f4250f | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15042/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15042/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15042/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: http
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875548#comment-15875548 ] Naganarasimha G R commented on YARN-6207: - Thanks [~bibinchundatt] Overall approach seems to be fine, few nits # one corner case when ClientRMService validates app state is still running but when it reaches scheduler application might have got completed hence to be safe just we can check whether scheduler application is not null for appId. # Can we think of moving {{dest.submitApplication(appId, user, destQueueName);}} below {{if (null != app)}} block so that its better we finish handling all attempt related stuff and then updated the application related modifcations ? # ln 2058, i think we can directly get {{application.getCurrentAppAttempt}} # ln 2103, Was wondering the queue partition information needs to be checked even if the attempt doesn't exist, thoughts? # comment at ln 2067 can be moved to just before {{source.finishApplicationAttempt} > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875523#comment-15875523 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 41m 32s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 64m 45s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestRMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12853661/YARN-6207.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 2bd822c68f97 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8035749 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15018/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15018/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15018/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/bro
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875475#comment-15875475 ] Rohith Sharma K S commented on YARN-6207: - Few more comments # I think !app.isStopped() can be done at upper level along with null check. {{if (null != app || !app.isStopped() )}} nit : change null check with java code style i.e app!=null. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875449#comment-15875449 ] Sunil G commented on YARN-6207: --- Few comments: 1. {{app.move(dest);}} is invoked event when app is STOPPED. Internally it updates queue metrics in source queue and also in appScheduling info (which also is stopped). I think if app is stopped, we can assume that all internal metrics of the app is released from source queue. Hence we may not need to do the same again in *move*. please check once . 2. {{abstractUsersManager.deactivateApplication(user, applicationId);}} this is invoked from *app.move()*. So do we need to call *LQ.finishApplication()* except the fact that queue may have to be moved to STOPPED if it was draining. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch, > YARN-6207.003.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875036#comment-15875036 ] Varun Saxena commented on YARN-6207: bq. Submit is on dest queue and finish is done on parent. Right. Actually this was fine pre YARN-5756. Because LeafQueue#finishApplication was just informing the users manager about deactivating the application and then calling ParentQueue#finishApplication. As we were merely moving app across queues, informing users manager for app deactivation was not required. But now, we are maintaining queue state as well which may have to be updated for source queue (if queue is DRAINING) on move and hence needs to be handled by calling LeafQueue#appFinished. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874728#comment-15874728 ] Bibin A Chundatt commented on YARN-6207: Thank you [~sunilg]/[~rohithsharma] finding already existing corner case and [~Naganarasimha] for summarizing. Will handle those scenarios also as part of this jira. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874400#comment-15874400 ] Rohith Sharma K S commented on YARN-6207: - Thanks [~naganarasimha...@apache.org] for summarizing what we have discussed. Hope [~bibinchundatt] can take over from here. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874336#comment-15874336 ] Naganarasimha G R commented on YARN-6207: - Thanks [~rohithsharma] & [~sunilg] bq. And also we discussed that move application should never-fail if application attempt is not added yet. Yes. This is what i was trying to signify by saying once the app is accepted by the scheduler then there should not be any reason to reject the move application. And roughly Me and [~bibinchundatt] were discussing the same as it was easily acheiveable by updating {{SchedulerApplication.setQueue}} at the same we need to ensure that the parent queue stats are updated accordingly. When AppAttempt gets added anyway its taking the queue information from {{SchedulerApplication}} so though attempt is not there when attempt gets added it works properly. bq. May be new flag is required to identify in SchedulerApplicationAttempt that done-app-attempt is called-off or not. Agree that there is a corner race case where in 2 Attempts can be added to the target queue when the events which are triggered are {{doneApplicationAttempt->moveApplication-> addApplicationAttempt (new attempt)}}. And for this we can have a flag in SchedulerApplication present in CS. Hope with above modifications all concerns will be solved, Thoughts? > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874223#comment-15874223 ] Rohith Sharma K S commented on YARN-6207: - Thanks [~sunilg] for summarizing the discussion. Adding more points what we discussed offline, we found many more corner cases that can create an issue in move application. It time to handle all those corner scenarios. And also we discussed that move application should *never-fail* if application attempt is not added yet. So, fix can be made from scheduler so that application can be moved even if attempt is NOT-added OR called done application attempt during attempt removal. May be new flag is required to identify in SchedulerApplicationAttempt that done-app-attempt is called-off or not. This will ensure move application need not fail if application is already in accepted state. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874113#comment-15874113 ] Sunil G commented on YARN-6207: --- app null check may not fix the race condition correctly. It can still cause corner case. I would like to continue discussion on attempt states as well. Lets take [~bibinchundatt] scenario itself where App Attempt events are in Async Dispatcher itself (delayed). The same scenario will happen after first attempt failure as well (2nd attempt is delayed). +What will happen in scheduler:+ In any case, we assume that {{SchedulerApplication}} is created inside scheduler (this is happening because we check app state in CientRMService as ACCEPTED/RUNNING). CS and FS’s *moveApplication* is invoking {{getApplicationAttempt}} to get app attempt object. {{AbsScheduler#getApplicationAttempt}} could return null is 2 cases. a) when application itself is not there. b) when curr attempt is null. As mentioned earlier in first line, app could not be null. Still attempt may be null. In case of first attempt failure, {{SchedulerApplication.getCurrentAppAttempt}} could return old object till 2nd attempt is set via {{APP_ATTEMPT_ADDED}}. Hence the app null check will not help in both scheduler (FS has only a null check for app, not for attempt). Even attempt null check also won’t help in case of first AM failure as scheduler does have old stale object in stopped form. Also there could be another corner case. Assume move app has called when 1st attempt was failed and 2nd attempt was in init states. It could potentially push 2 attempts to target queue. Ideally if we fix in ClientRMServer, we need not have to worry changes across scheduler. If attempt state is ACCEPTED to RUNNING, we are sure that new attempt is added to scheduler. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoo
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874013#comment-15874013 ] Naganarasimha G R commented on YARN-6207: - bq. Any scenario which causes attempt to be null will get handled with approach 2 Agree, this should be good enough. And also further checked the scenario which i mentioned, failed first attempt is not actually removed from SchedulerApplication but anyway this check will further ensure that there are no other cases where it can fail. +1 for the latest patch, if no other concerns will commit it. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874011#comment-15874011 ] Bibin A Chundatt commented on YARN-6207: [~sunilg]/[~naganarasimha...@apache.org] Hope the second approach done is fine. Fair already handles this case of attempt can be null in scheduler side. Any scenario which causes attempt to be null will get handled with approach 2 > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873969#comment-15873969 ] Sunil G commented on YARN-6207: --- *move-app-to-queue* is majorly a scheduler operation. It mandatory that we have app (fica/fs app) in a queue within scheduler. When an RMAttempt is failed, it ll be removed from scheduler (APP_ATTEMPT_REMOVED), and new attempt has to be added back to scheduler. In this mean time, there will not be any meaning for move_app operation. This new check added by the patch will now help in handling the same as we are checking state from current running app attempt. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch, YARN-6207.002.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873811#comment-15873811 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 7s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 40m 46s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 63m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12853479/YARN-6207.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 474d8a83a4a3 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 172b23a | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15010/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15010/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15010/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issu
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873779#comment-15873779 ] Hadoop QA commented on YARN-6207: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 42m 18s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMService | | | hadoop.yarn.server.resourcemanager.TestRMRestart | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSAppStarvation | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | YARN-6207 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12853477/YARN-6207.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux da079091ee81 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 172b23a | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/15009/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/15009/testReport/ | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/15009/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically gene
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873763#comment-15873763 ] Bibin A Chundatt commented on YARN-6207: {quote} is it correct not to allow if the attempt state is not active {quote} Currently for move . Application state are checked in the same way. Since the move operation is not a frequent operation i think its fine to handle this way. Will handle the first comment. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873755#comment-15873755 ] Naganarasimha G R commented on YARN-6207: - Thanks for the patch [~bibinchundatt], At broader aspect, on second thoughts, is it correct not to allow if the attempt state is not active ? consider the case where 1st attempt has failed and its about to launch second attempt, is it not valid to move it another queue in this scenario ? few comments on the patch # {{RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED}} is actually not {{ACTIVE_ATTEMPT_STATES}} and seems quite the opposite . # Suppose if the first attempt is failed and second attempt is not yet launched then there can be a raise where in 2nd attempt can be in NEW/ SUBMITTED and similar issue can still happen. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: YARN-6207.001.patch > > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871460#comment-15871460 ] Naganarasimha G R commented on YARN-6207: - bq. So if we could check active attempt's state here, we could avoid this scenario rt? Agree ... > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871457#comment-15871457 ] Varun Saxena commented on YARN-6207: bq. In ClientRMService.moveApplicationAcrossQueues, we already have this check. So if we could check active attempt's state here, we could avoid this scenario rt? Agree. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871456#comment-15871456 ] Bibin A Chundatt commented on YARN-6207: {quote} we could check active attempt's state here, we could avoid this scenario rt? {quote} +1 > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871440#comment-15871440 ] Varun Saxena commented on YARN-6207: bq. RMAppAttempImpl will be moved to RMAppAttemptState.SCHEDULED once RMAppAttemptEventType.ATTEMPT_ADDED Yeah we can do that. But we still need to decide if we wait or just fail the move. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871439#comment-15871439 ] Sunil G commented on YARN-6207: --- {code} if (!ACTIVE_APP_STATES.contains(application.getState())) { String msg = "App in " + application.getState() + " state cannot be moved."; RMAuditLogger.logFailure(callerUGI.getShortUserName(), AuditConstants.MOVE_APP_REQUEST, "UNKNOWN", "ClientRMService", msg); throw new YarnException(msg); } {code} In {{ClientRMService.moveApplicationAcrossQueues}}, we already have this check. So if we could check active attempt's state here, we could avoid this scenario rt? > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871432#comment-15871432 ] Varun Saxena commented on YARN-6207: bq. Other approach what me and Bibin A Chundatt were thinking of was to send SchedulerEvent for handling this move If we do that, how are we going to indicate to Client that move was unsuccessful? > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871425#comment-15871425 ] Sunil G commented on YARN-6207: --- RMAppAttempImpl will be moved to RMAppAttemptState.SCHEDULED once RMAppAttemptEventType.ATTEMPT_ADDED fired from scheduler. So from ClientRMService, move app can block any operation is the currentAppAttempt's state not yet reached RMAppAttemptState.SCHEDULED. (ie; in effect NOT of SUBMITTED or NEW). > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871424#comment-15871424 ] Varun Saxena commented on YARN-6207: I think this is because event was sitting in scheduler queue. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871423#comment-15871423 ] Naganarasimha G R commented on YARN-6207: - [~sunilg], IIRC the RMAppAttempImpl was already in SUBMITTED state just that the recovery events were queued up in the SchedulerEventDispatcher, hence we were able to see the exception. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871394#comment-15871394 ] Sunil G commented on YARN-6207: --- I feel you can check RMAppAttempImpl state. If state is SUBMITTED or NEW, its assured that scheduler is not having any information about the app. Similar checks for app attempt states are done in ApplicationMasterService also. This can avoid such a bomb to occur within scheduler and we can avoid it. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871308#comment-15871308 ] Naganarasimha G R commented on YARN-6207: - bq. Considering this is unlikely to happen unless we are doing mass movement of apps on a busy cluster, this may be fine. Need not be, suppose during RM transition "Move app" is done then user might not be aware that RM transition was happening in the backend. Other approach what me and [~bibinchundatt] were thinking of was to send SchedulerEvent for handling this move, so that it gets queued up in the SchedulerEventDispatcher and not get immediately invoked. We will further check the pro's and cons of the same ! > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6207) Move application can fail when attempt add event is delayed
[ https://issues.apache.org/jira/browse/YARN-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871257#comment-15871257 ] Varun Saxena commented on YARN-6207: We have 2 options here. Either send a more descriptive message and related exception back to the client. Considering this is unlikely to happen unless we are doing mass movement of apps on a busy cluster, this may be fine. Or, wait for a while to let the attempt be added to be added to scheduler and then throw an exception. > Move application can fail when attempt add event is delayed > > > Key: YARN-6207 > URL: https://issues.apache.org/jira/browse/YARN-6207 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > > *Steps to reproduce* > 1.Submit application and delay attempt add to Scheduler > (Simulate using debug at EventDispatcher for SchedulerEventDispatcher) > 2. Call move application to destination queue. > {noformat} > Caused by: > org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.preValidateMoveApplication(CapacityScheduler.java:2086) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.moveApplicationAcrossQueue(RMAppManager.java:669) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.moveApplicationAcrossQueues(ClientRMService.java:1231) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBServiceImpl.java:388) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:537) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:522) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:867) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1892) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1483) > at org.apache.hadoop.ipc.Client.call(Client.java:1429) > at org.apache.hadoop.ipc.Client.call(Client.java:1339) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115) > at com.sun.proxy.$Proxy7.moveApplicationAcrossQueues(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.moveApplicationAcrossQueues(ApplicationClientProtocolPBClientImpl.java:398) > ... 16 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org