[ 
https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith Sharma K S updated YARN-4862:
------------------------------------
    Attachment: 0003-YARN-4862.patch

Updating the rebased patch fixing review comments from Jason.

This JIRA is depends upon YARN-5279 for avoiding leak in the new set 
completedContainers.

I rechecked the 2 scenarios mainly discussed in earlier comments
# RM(scheduler) forgets container details (YARN-5279). In this case, for any 
unknown completed container reported from NodeManager, scheduler will intimate 
back to RMNodeImpl that these containers are no more maintained by scheduler, 
so inform to NodeManager to remove from NMContext. This event avoids leak in 
new set completedContainer which clears containers that are acknowledging to NM 
in heartbeat response 
# NM forgets to send completedContainer
## NM do not send completedContainer at once also. It is nothing but  
YARN-5197. 
## NM sends completed container in one heartbeat and later it forgets it in 
next heartbeat. In this case, {{RMNodeImpl#completedContainer}} need not worry 
about leak because if once completed container has been sent then RM keep track 
of these containers. Even though NM forgets sending later, completedContainer 
will be cleared when RM acknowledge back to NM in heartbeat to remove from 
context.  

> Handle duplicate completed containers in RMNodeImpl
> ---------------------------------------------------
>
>                 Key: YARN-4862
>                 URL: https://issues.apache.org/jira/browse/YARN-4862
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>         Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch, 
> 0003-YARN-4862.patch
>
>
> As per 
> [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689]
>  from [~sharadag], there should be safe guard for duplicated container status 
> in RMNodeImpl before creating UpdatedContainerInfo. 
> Or else in heavily loaded cluster where event processing is gradually slow, 
> if any duplicated container are sent to RM(may be bug in NM also), there is 
> significant impact that RMNodImpl always create UpdatedContainerInfo for 
> duplicated containers. This result in increase in the heap memory and causes 
> problem like YARN-4852.
> This is an optimization for issue kind YARN-4852



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to