[ https://issues.apache.org/jira/browse/YARN-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16352238#comment-16352238 ]
Rohith Sharma K S commented on YARN-7835: ----------------------------------------- bq. An alternative would be be to only clean up the collector when the application finishes instead of when an AM container finishes It is doable and should be fine! One concern from very very rare scenario is this will make collector map to retain as long as application_stop event triggers. Lets take example where 1st attempt is running in Node-1 and killed. 2nd attempt started on different node, but Node-1 doesn't get application_stop event since application is still running which causes Node-1 to keep this map. Once application is finished, this will be removed but if it is long running application, then this map will retain in two nodemanagers. It would be become a gradual leak in case of long running applications. > [Atsv2] Race condition in NM while publishing events if second attempt > launched on same node > -------------------------------------------------------------------------------------------- > > Key: YARN-7835 > URL: https://issues.apache.org/jira/browse/YARN-7835 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Rohith Sharma K S > Assignee: Rohith Sharma K S > Priority: Critical > Attachments: YARN-7835.001.patch > > > It is observed race condition that if master container is killed for some > reason and launched on same node then NMTimelinePublisher doesn't add > timelineClient. But once completed container for 1st attempt has come then > NMTimelinePublisher removes the timelineClient. > It causes all subsequent event publishing from different client fails to > publish with exception Application is not found. ! -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org