[ 
https://issues.apache.org/jira/browse/YARN-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736704#comment-16736704
 ] 

Rohith Sharma K S commented on YARN-6695:
-----------------------------------------

[~eyang] Publishing container events from RM is disabled by default i.e 
*yarn.rm.system-metrics-publisher.emit-container-events* is set to *false*. Are 
you enabled this configuration? And we don't recommend to enable this 
configuration since it overloads RM with lot of events. If you can attach stack 
trace would be help full. 

Reg the patch, I am not a fan of catching NPE! Instead lets do explicit null 
check and log with right message something similar to 
NMTimelinePublisher#putEntity. 

> Race condition in RM for publishing container events vs appFinished events 
> causes NPE 
> --------------------------------------------------------------------------------------
>
>                 Key: YARN-6695
>                 URL: https://issues.apache.org/jira/browse/YARN-6695
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Rohith Sharma K S
>            Priority: Critical
>         Attachments: YARN-6695.001.patch
>
>
> When RM publishes container events i.e by enabling 
> *yarn.rm.system-metrics-publisher.emit-container-events*, there is race 
> condition for processing events 
> vs appFinished event that removes appId from collector list which cause NPE. 
> Look at the below trace where appId is removed from collectors first and then 
> corresponding events are processed. 
> {noformat}
> 2017-06-06 19:28:48,896 INFO  capacity.ParentQueue 
> (ParentQueue.java:removeApplication(472)) - Application removed - appId: 
> application_1496758895643_0005 user: root leaf-queue of parent: root 
> #applications: 0
> 2017-06-06 19:28:48,921 INFO  collector.TimelineCollectorManager 
> (TimelineCollectorManager.java:remove(190)) - The collector service for 
> application_1496758895643_0005 was removed
> 2017-06-06 19:28:48,922 ERROR metrics.TimelineServiceV2Publisher 
> (TimelineServiceV2Publisher.java:putEntity(451)) - Error when publishing 
> entity TimelineEntity[type='YARN_CONTAINER', 
> id='container_e01_1496758895643_0005_01_000002']
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.putEntity(TimelineServiceV2Publisher.java:448)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher.access$100(TimelineServiceV2Publisher.java:72)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:480)
>       at 
> org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV2Publisher$TimelineV2EventHandler.handle(TimelineServiceV2Publisher.java:469)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:201)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:127)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to