[ https://issues.apache.org/jira/browse/YARN-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206979#comment-15206979 ]
Naganarasimha G R commented on YARN-4711: ----------------------------------------- Thanks for sharing your views [~sjlee0], but main reason for going for this approach is unnecessarily the entities needs to be places in 2 queues so any way timeline client also gives the same benefit hence thought of placing only in one queue. I have avoided the NPE in multiple ways but we just need to decide here whether its ok to have code level complexity and place async directly to timelineclient or have common approach for handling sync and async entity publish. IMO i thought it would be better to avoid unnecessary placement in the dispatcher queue. > NM is going down with NPE's due to single thread processing of events by > Timeline client > ---------------------------------------------------------------------------------------- > > Key: YARN-4711 > URL: https://issues.apache.org/jira/browse/YARN-4711 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Naganarasimha G R > Assignee: Naganarasimha G R > Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: 4711Analysis.txt, YARN-4711-YARN-2928.v1.001.patch > > > After YARN-3367, while testing the latest 2928 branch came across few NPEs > due to which NM is shutting down. > {code} > 2016-02-21 23:19:54,078 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: > Error in dispatcher thread > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:306) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ContainerEventHandler.handle(NMTimelinePublisher.java:296) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > {code} > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.putEntity(NMTimelinePublisher.java:213) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.publishContainerFinishedEvent(NMTimelinePublisher.java:192) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher.access$400(NMTimelinePublisher.java:63) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:289) > at > org.apache.hadoop.yarn.server.nodemanager.timelineservice.NMTimelinePublisher$ApplicationEventHandler.handle(NMTimelinePublisher.java:280) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) > at java.lang.Thread.run(Thread.java:745) > {code} > On analysis found that the there was delay in processing of events, as after > YARN-3367 all the events were getting processed by a single thread inside the > timeline client. > Additionally found one scenario where there is possibility of NPE: > * TimelineEntity.toString() when {{real}} is not null -- This message was sent by Atlassian JIRA (v6.3.4#6332)