[ 
https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324277#comment-14324277
 ] 

Jason Lowe commented on YARN-914:
---------------------------------

bq. I think prediction of expected runtime of containers could be hard in YARN 
case. However, can we typically say long running service containers are 
expected to run very long or infinite? If so, notifying AM to preempt 
containers of LRS make more sense here than waiting here for timeout. Isn't it? 

The main point I'm trying to make here is that we shouldn't be worrying too 
much about long-running services right now.  YARN doesn't even know which are 
which yet, and without any kind of container lifespan prediction there's no way 
to know whether a container will finish within the decomm timeout window or 
not.  YARN knowing which apps are LRS is a primitive form of container lifespan 
prediction (i.e.: LRS = containers run forever).  We will have the same 
problems with apps that aren't LRS but have containers that can run for a 
"long" time, where "long" is larger than the decomm timeout.  That's why I'm 
not convinced it makes sense to do anything special for LRS apps vs. other apps.

In the short-term I think we just go with a configurable decomm timeout and AM 
notification via strict preemption as the timeout expires.  If we want to get a 
bit fancier, we can annotate the strict preemption with a timeout so the AM 
knows approximately _when_ the preemption will occur.  With that feature we 
would notify AMs as soon as the node is marked for decomm that their containers 
will be forcibly preempted (i.e.: killed) in X minutes, and it's up to each AM 
to decide whether to do anything about it or if their containers on that node 
will complete within that time naturally.  With that setup we don't have to 
special-case LRS apps or anything like that, as we're telling the apps ASAP the 
decomm is happening and giving them time to deal with it, LRS or not.

> Support graceful decommission of nodemanager
> --------------------------------------------
>
>                 Key: YARN-914
>                 URL: https://issues.apache.org/jira/browse/YARN-914
>             Project: Hadoop YARN
>          Issue Type: Improvement
>    Affects Versions: 2.0.4-alpha
>            Reporter: Luke Lu
>            Assignee: Junping Du
>         Attachments: Gracefully Decommission of NodeManager (v1).pdf, 
> Gracefully Decommission of NodeManager (v2).pdf
>
>
> When NMs are decommissioned for non-fault reasons (capacity change etc.), 
> it's desirable to minimize the impact to running applications.
> Currently if a NM is decommissioned, all running containers on the NM need to 
> be rescheduled on other NMs. Further more, for finished map tasks, if their 
> map output are not fetched by the reducers of the job, these map tasks will 
> need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a 
> node manager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to