[ https://issues.apache.org/jira/browse/YARN-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736803#comment-16736803 ]
Wangda Tan commented on YARN-6695: ---------------------------------- Thanks [~eyang]/[~rohithsharma], I'm going to update target version to next release and unblock 3.1.2 and 3.2.0. > Race condition in RM for publishing container events vs appFinished events > causes NPE > -------------------------------------------------------------------------------------- > > Key: YARN-6695 > URL: https://issues.apache.org/jira/browse/YARN-6695 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Rohith Sharma K S > Priority: Critical > Attachments: YARN-6695.001.patch > > > When RM publishes container events i.e by enabling > *yarn.rm.system-metrics-publisher.emit-container-events*, there is race > condition for processing events > vs appFinished event that removes appId from collector list which cause NPE. > Look at the below trace where appId is removed from collectors first and then > corresponding events are processed. > {noformat} > 2017-06-06 19:28:48,896 INFO capacity.ParentQueue > (ParentQueue.java:removeApplication(472)) - Application removed - appId: > application_1496758895643_0005 user: root leaf-queue of parent: root > #applications: 0 > 2017-06-06 19:28:48,921 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(190)) - The collector service for > application_1496758895643_0005 was removed > 2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher > (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing > entity TimelineEntity[type='YARN_CONTAINER', > id='container_e01_1496758895643_0005_01_000002'] > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480) > at > org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org