[ 
https://issues.apache.org/jira/browse/YARN-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341991#comment-16341991
 ] 

Rohith Sharma K S commented on YARN-7835:
-----------------------------------------

The below log trace shows that 2nd attempt master container has come to same 
node manager and didn't add into timelineclient since it already exist! But 
when 1st attempt complected container received, NMTImelinePublisher removes the 
timelineclient.
{code}
  2018-01-27 04:55:35,193 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(446)) - Adding 
container_e22_1516990344374_0007_02_000001 to application 
application_1516990344374_0007
2018-01-27 04:55:35,195 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(2108)) - Container 
container_e22_1516990344374_0007_02_000001 transitioned from NEW to LOCALIZING
2018-01-27 04:55:35,195 INFO  containermanager.AuxServices 
(AuxServices.java:handle(220)) - Got event CONTAINER_INIT for appId 
application_1516990344374_0007
2018-01-27 04:55:35,196 INFO  collector.TimelineCollectorManager 
(TimelineCollectorManager.java:putIfAbsent(149)) - the collector for 
application_1516990344374_0007 already exists!
...
...
2018-01-27 04:55:36,109 INFO  nodemanager.NodeStatusUpdaterImpl 
(NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(682)) - 
Removed completed containers from NM context: 
[container_e22_1516990344374_0007_01_000001]
2018-01-27 04:55:36,112 INFO  collector.TimelineCollectorManager 
(TimelineCollectorManager.java:remove(192)) - The collector service for 
application_1516990344374_0007 was removed
2018-01-27 04:55:36,430 ERROR collector.TimelineCollectorWebService 
(TimelineCollectorWebService.java:putEntities(165)) - Application: 
application_1516990344374_0007 is not found
2018-01-27 04:55:36,430 ERROR collector.TimelineCollectorWebService 
(TimelineCollectorWebService.java:putEntities(179)) - Error putting entities
org.apache.hadoop.yarn.webapp.NotFoundException
        at org.apache.hadoop.yarn.server.timelineservice.co
{code}

> [Atsv2] Race condition in NM while publishing events if second attempt 
> launched in same node
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-7835
>                 URL: https://issues.apache.org/jira/browse/YARN-7835
>             Project: Hadoop YARN
>          Issue Type: Bug
>         Environment: It is observed race condition that if master container 
> is killed for some reason and launched on same node then NMTimelinePublisher 
> doesn't add timelineClient. But once completed container for 1st attempt has 
> come then NMTimelinePublisher removes the timelineClient. 
> It causes all subsequent event publishing from different client fails to 
> publish with exception Application is not found. !
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to