[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324390#comment-14324390
 ] 

Junping Du commented on YARN-914:
---------------------------------

bq. The main point I'm trying to make here is that we shouldn't be worrying too 
much about long-running services right now. 
Agree. Especially we were pushing the tracking of timeout out of YARN core in 
above discussion. The new CLI will track time (configurable per operation) and 
send force decommission after timeout. We can add notification to AM on NM's 
decommissioning (and timeout) also which could be more complicated though. 

bq.  In the short-term I think we just go with a configurable decomm timeout 
and AM notification via strict preemption as the timeout expires. If we want to 
get a bit fancier, we can annotate the strict preemption with a timeout so the 
AM knows approximately when the preemption will occur.
Ok. My understanding here is we have two steps here: 1. notify AM in strict 
preemption after timeout; 2. notify AM in flexible preemption with tolerant 
timeout when start decommissioning. Quick question here is: what's the benefit 
of step 1 over decommission nodes directly after timeout? If there is benefit, 
why we don't do this today when decommission nodes?

bq. With that feature we would notify AMs as soon as the node is marked for 
decomm that their containers will be forcibly preempted (i.e.: killed) in X 
minutes, and it's up to each AM to decide whether to do anything about it or if 
their containers on that node will complete within that time naturally. With 
that setup we don't have to special-case LRS apps or anything like that, as 
we're telling the apps ASAP the decomm is happening and giving them time to 
deal with it, LRS or not.
Make sense. Sounds like there is a sub JIRA already being created, and we can 
extend it to have a timeout.

  

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to