[ https://issues.apache.org/jira/browse/YARN-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lujie updated YARN-9238: ------------------------ Description: See org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate {code:java} // Allocate OPPORTUNISTIC containers. 171. SchedulerApplicationAttempt appAttempt = 172. ((AbstractYarnScheduler)rmContext.getScheduler()) 173. .getApplicationAttempt(appAttemptId); 174. 175. OpportunisticContainerContext oppCtx = 176. appAttempt.getOpportunisticContainerContext(); 177. oppCtx.updateNodeList(getLeastLoadedNodes()); {code} MRAppmaster crashes before before allocate#171, ResourceManager will start the new appAttempt and do {code:java} org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T currentAttempt){ this.currentAttempt = currentAttempt; }{code} hence the allocate#171 will get the new appAttmept and its field OpportunisticContainerContext hasn't been initialized. so oopCtx ==null at and null pointer happens at line 177 {code:java} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177) at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code} was: See org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate {code:java} // Allocate OPPORTUNISTIC containers. 171. SchedulerApplicationAttempt appAttempt = 172. ((AbstractYarnScheduler)rmContext.getScheduler()) 173. .getApplicationAttempt(appAttemptId); 174. 175. OpportunisticContainerContext oppCtx = 176. appAttempt.getOpportunisticContainerContext(); 177. oppCtx.updateNodeList(getLeastLoadedNodes()); {code} if "allocate" arrive at line#171 and MRAppmaster crashes, ResourceManager will start the new appAttempt and do {code:java} org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T currentAttempt){ this.currentAttempt = currentAttempt; }{code} the new appAttmept hasn't init its field OpportunisticContainerContext , hence oopCtx ==null and null pointer happens at line 177 {code:java} java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177) at org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code} > Allocate on previous or removed or non existent application attempt > ------------------------------------------------------------------- > > Key: YARN-9238 > URL: https://issues.apache.org/jira/browse/YARN-9238 > Project: Hadoop YARN > Issue Type: Bug > Reporter: lujie > Assignee: lujie > Priority: Critical > Attachments: YARN-9238_1.patch, YARN-9238_2.patch, YARN-9238_3.patch, > hadoop-test-resourcemanager-hadoop11.log > > > See > org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate > {code:java} > // Allocate OPPORTUNISTIC containers. > 171. SchedulerApplicationAttempt appAttempt = > 172. ((AbstractYarnScheduler)rmContext.getScheduler()) > 173. .getApplicationAttempt(appAttemptId); > 174. > 175. OpportunisticContainerContext oppCtx = > 176. appAttempt.getOpportunisticContainerContext(); > 177. oppCtx.updateNodeList(getLeastLoadedNodes()); > {code} > MRAppmaster crashes before before allocate#171, ResourceManager will start > the new appAttempt and do > {code:java} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T > currentAttempt){ > this.currentAttempt = currentAttempt; > }{code} > hence the allocate#171 will get the new appAttmept and its field > OpportunisticContainerContext hasn't been initialized. > so oopCtx ==null at and null pointer happens at line 177 > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org