[jira] [Commented] (YARN-6167) RM option to delegate NM loss container action to AM

Billie Rinaldi (JIRA) Mon, 22 Oct 2018 17:33:43 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659882#comment-16659882
 ]


Billie Rinaldi commented on YARN-6167:
--------------------------------------

Thanks for taking a look, [~leftnoteasy]!

bq. Inside releaseContainers, why add following if?
There are two conditions where the pending release needs to be remembered, when 
RMContainer is null and when it is not null. It's not null in the cases I've 
tested so far, but maybe it would be null if the RM has been restarted.

bq. Why changes of RMContainerImpl required?
In my testing, when an NM is killed and restarted some time later, the NM 
registers and the RMNode is added before the NM has recovered the containers. 
The containers come in later with a node update event, and they look like newly 
launched containers. So in this case the RM already has an RMContainer in 
RUNNING state, and then it gets a LAUNCHED event for that container.

bq. Instead of handling "pendingRelease" for such containers, can we let AM 
handle it once NM comes back to normal state? AM should be notified after that.
The AM patch I've started working on to test this feature is already pretty 
complicated, but maybe it wouldn't be too difficult to have the AM remember 
which containers it should release. I'll look into it.

> RM option to delegate NM loss container action to AM
> ----------------------------------------------------
>
>                 Key: YARN-6167
>                 URL: https://issues.apache.org/jira/browse/YARN-6167
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: scheduler
>            Reporter: Billie Rinaldi
>            Assignee: Billie Rinaldi
>            Priority: Major
>         Attachments: YARN-6167.01.patch
>
>
> Currently, if the RM times out an NM, the scheduler will kill all containers 
> that were running on the NM. For some applications, in the event of a 
> temporary NM outage, it might be better to delegate to the AM the decision 
> whether to kill the containers and request new containers from the RM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-6167) RM option to delegate NM loss container action to AM

Reply via email to