[ 
https://issues.apache.org/jira/browse/YARN-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322683#comment-14322683
 ] 

Devaraj K commented on YARN-3194:
---------------------------------

Thanks [~rohithsharma] for reporting and [~jianhe] for your inputs.

I am also able to reproduce this issue, I see that RM is not getting 
information about the completed containers after NM restart as part of the 
nodeheartbeat request and also AM is not informing the RM to release these 
completed containers. As a result RM is assuming these containers are running 
and not releasing these completed container resources.

> After NM restart,completed containers are not released which are sent during 
> NM registration
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-3194
>                 URL: https://issues.apache.org/jira/browse/YARN-3194
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.6.0
>         Environment: NM restart is enabled
>            Reporter: Rohith
>            Assignee: Rohith
>
> On NM restart ,NM sends all the outstanding NMContainerStatus to RM. But RM 
> process only ContainerState.RUNNING. If container is completed when NM was 
> down then those containers resources wont be release which result in 
> applications to hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to