[ https://issues.apache.org/jira/browse/YARN-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474608#comment-16474608 ]
Hudson commented on YARN-8130: ------------------------------ SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14195 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14195/]) YARN-8130 Race condition when container events are published for KILLED (haibochen: rev 2d00a0c71b5dde31e2cf8fcb96d9d541d41fb879) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelinePublisher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEvent.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/NMTimelineEventType.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/timelineservice/TestNMTimelinePublisher.java > Race condition when container events are published for KILLED applications > -------------------------------------------------------------------------- > > Key: YARN-8130 > URL: https://issues.apache.org/jira/browse/YARN-8130 > Project: Hadoop YARN > Issue Type: Sub-task > Components: ATSv2 > Reporter: Charan Hebri > Assignee: Rohith Sharma K S > Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8130.01.patch, YARN-8130.02.patch, > YARN-8130.03.patch > > > There seems to be a race condition happening when an application is KILLED > and the corresponding container event information is being published. For > completed containers, a YARN_CONTAINER_FINISHED event is generated but for > some containers in a KILLED application this information is missing. Below is > a node manager log snippet, > {code:java} > 2018-04-09 08:44:54,474 INFO shuffle.ExternalShuffleBlockResolver > (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application > application_1523259757659_0003 removed, cleanupLocalDirs = false > 2018-04-09 08:44:54,478 INFO application.ApplicationImpl > (ApplicationImpl.java:handle(632)) - Application > application_1523259757659_0003 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2018-04-09 08:44:54,478 ERROR timelineservice.NMTimelinePublisher > (NMTimelinePublisher.java:putEntity(298)) - Seems like client has been > removed before the entity could be published for > TimelineEntity[type='YARN_CONTAINER', > id='container_1523259757659_0003_01_000002'] > 2018-04-09 08:44:54,478 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just > finished : application_1523259757659_0003 > 2018-04-09 08:44:54,488 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_000001. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:54,492 INFO logaggregation.AppLogAggregatorImpl > (AppLogAggregatorImpl.java:doContainerLogAggregation(576)) - Uploading logs > for container container_1523259757659_0003_01_000002. Current good log dirs > are /grid/0/hadoop/yarn/log > 2018-04-09 08:44:55,470 INFO collector.TimelineCollectorManager > (TimelineCollectorManager.java:remove(192)) - The collector service for > application_1523259757659_0003 was removed > 2018-04-09 08:44:55,472 INFO containermanager.ContainerManagerImpl > (ContainerManagerImpl.java:handle(1572)) - couldn't find application > application_1523259757659_0003 while processing FINISH_APPS event. The > ResourceManager allocated resources for this application to the NodeManager > but no active containers were found to process{code} > The container id specified in the log, > *container_1523259757659_0003_01_000002* is the one that has the finished > event missing. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org