Haibo Chen created YARN-9111: -------------------------------- Summary: NM crashes because Fair scheduler promotes a container that has not been pulled by AM Key: YARN-9111 URL: https://issues.apache.org/jira/browse/YARN-9111 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler, nodemanager Affects Versions: YARN-1011 Reporter: Haibo Chen
{code:java} 2018-10-19 22:34:35,052 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread java.lang.NullPointerException at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerTokenIdentifier(BuilderUtils.java:323) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:1649) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.handle(ContainerManagerImpl.java:185) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126) at java.lang.Thread.run(Thread.java:748) 2018-10-19 22:34:35,054 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2018-10-19 22:34:35,059 DEBUG org.apache.hadoop.service.AbstractService: Service: NodeManager entered state STOPPED{code} When a container is allocated by RM to an application, its container token is not generated until the AM pulls that container from RM. However, it the scheduler decides to promote that container before it is pulled by the AM, it does not have container token to work with. The current code does not update/generate the container token as such. When container promotion is sent to NM to process, the NM crashes on NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org