[ 
https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515108#comment-14515108
 ] 

Jason Lowe commented on YARN-3212:
----------------------------------

bq. releasing an unlaunched container is pretty cheap which could be better 
than wait the container to executed from beginning

One could argue that the whole point of the graceful decommission is to avoid 
container failures, and this would be a container failure from the perspective 
of the AM.  In that sense we should honor the container if we already handed it 
out to the AM (i.e.: RMContainerImpl instance is in the ACQUIRED state).  We 
should be able to turn off scheduling for the node and then after doing so 
query the scheduler to see what containers are still active on that node.

> RMNode State Transition Update with DECOMMISSIONING state
> ---------------------------------------------------------
>
>                 Key: YARN-3212
>                 URL: https://issues.apache.org/jira/browse/YARN-3212
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, 
> YARN-3212-v2.patch, YARN-3212-v3.patch
>
>
> As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and 
> can transition from “running” state triggered by a new event - 
> “decommissioning”. 
> This new state can be transit to state of “decommissioned” when 
> Resource_Update if no running apps on this NM or NM reconnect after restart. 
> Or it received DECOMMISSIONED event (after timeout from CLI).
> In addition, it can back to “running” if user decides to cancel previous 
> decommission by calling recommission on the same node. The reaction to other 
> events is similar to RUNNING state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to