[ https://issues.apache.org/jira/browse/YARN-7511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-7511: --------------------------- Attachment: YARN-7511.001.patch Attaching v1 patch for review. > NPE in ContainerLocalizer when localization failed for running container > ------------------------------------------------------------------------ > > Key: YARN-7511 > URL: https://issues.apache.org/jira/browse/YARN-7511 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0-alpha4, 2.9.1 > Reporter: Tao Yang > Assignee: Tao Yang > Attachments: YARN-7511.001.patch > > > Error log: > {noformat} > 2017-09-30 20:14:32,839 FATAL [AsyncDispatcher event handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread > java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.replaceNode(ConcurrentHashMap.java:1106) > at > java.util.concurrent.ConcurrentHashMap.remove(ConcurrentHashMap.java:1097) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceSet.resourceLocalizationFailed(ResourceSet.java:151) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ResourceLocalizationFailedWhileRunningTransition.transition(ContainerImpl.java:821) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$ResourceLocalizationFailedWhileRunningTransition.transition(ContainerImpl.java:813) > at > org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1335) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:95) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1372) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1365) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:834) > 2017-09-30 20:14:32,842 INFO [AsyncDispatcher ShutDown handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. > {noformat} > Reproduce this problem: > 1. Container was running and ContainerManagerImpl#localize was called for > this container > 2. Localization failed in ResourceLocalizationService$LocalizerRunner#run and > sent out ContainerResourceFailedEvent with null LocalResourceRequest. > 3. NPE when ResourceLocalizationFailedWhileRunningTransition#transition --> > container.resourceSet.resourceLocalizationFailed(null) > I think we can fix this problem through ensuring that request is not null > before remove it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org