[ https://issues.apache.org/jira/browse/YARN-9238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weiwei Yang updated YARN-9238: ------------------------------ Summary: Avoid allocating opportunistic containers to previous/removed/non-exist application attempt (was: Avoid to allocate opportunistic containers to previous/removed/non-exist application attempt) > Avoid allocating opportunistic containers to previous/removed/non-exist > application attempt > ------------------------------------------------------------------------------------------- > > Key: YARN-9238 > URL: https://issues.apache.org/jira/browse/YARN-9238 > Project: Hadoop YARN > Issue Type: Bug > Reporter: lujie > Assignee: lujie > Priority: Critical > Attachments: YARN-9238_1.patch, YARN-9238_2.patch, YARN-9238_3.patch, > hadoop-test-resourcemanager-hadoop11.log > > > See > org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService.OpportunisticAMSProcessor.allocate > {code:java} > // Allocate OPPORTUNISTIC containers. > 171. SchedulerApplicationAttempt appAttempt = > 172. ((AbstractYarnScheduler)rmContext.getScheduler()) > 173. .getApplicationAttempt(appAttemptId); > 174. > 175. OpportunisticContainerContext oppCtx = > 176. appAttempt.getOpportunisticContainerContext(); > 177. oppCtx.updateNodeList(getLeastLoadedNodes()); > {code} > MRAppmaster crashes before before allocate#171, ResourceManager will start > the new appAttempt and do > {code:java} > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplication.setCurrentAppAttempt(T > currentAttempt){ > this.currentAttempt = currentAttempt; > }{code} > hence the allocate#171 will get the new appAttmept and its field > OpportunisticContainerContext hasn't been initialized. > so oopCtx ==null at and null pointer happens at line 177 > {code:java} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.OpportunisticContainerAllocatorAMService$OpportunisticAMSProcessor.allocate(OpportunisticContainerAllocatorAMService.java:177) > at > org.apache.hadoop.yarn.server.resourcemanager.AMSProcessingChain.allocate(AMSProcessingChain.java:92) > at > org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:424) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60) > at > org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:530) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:943) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:878) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2830) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org