Tao Yang created YARN-7511: ------------------------------ Summary: NPE in ContainerLocalizer when localization failed for running container Key: YARN-7511 URL: https://issues.apache.org/jira/browse/YARN-7511 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 3.0.0-alpha4, 2.9.1 Reporter: Tao Yang Assignee: Tao Yang
Error log: {noformat} 2017-09-30 20:14:32,839 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106) at java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceSet.resourceLocalizationFailed(ResourceSet.java:151) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ResourceLocalizationFailedWhileRunningTransition.transition(ContainerImpl.java:821) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ResourceLocalizationFailedWhileRunningTransition.transition(ContainerImpl.java:813) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1335) at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:95) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1372) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1365) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) at java.lang.Thread.run(Thread.java:834) 2017-09-30 20:14:32,842 INFO [AsyncDispatcher ShutDown handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. {noformat} Reproduce this problem: 1. Container was running and ContainerManagerImpl#localize was called for this container 2. Localization failed in ResourceLocalizationService$LocalizerRunner#run and sent out ContainerResourceFailedEvent with null LocalResourceRequest. 3. NPE when ResourceLocalizationFailedWhileRunningTransition#transition --> container.resourceSet.resourceLocalizationFailed(null) I think we can fix this problem through ensuring that request is not null before remove it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org