[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324277#comment-14324277 ]
Jason Lowe commented on YARN-914: --------------------------------- bq. I think prediction of expected runtime of containers could be hard in YARN case. However, can we typically say long running service containers are expected to run very long or infinite? If so, notifying AM to preempt containers of LRS make more sense here than waiting here for timeout. Isn't it? The main point I'm trying to make here is that we shouldn't be worrying too much about long-running services right now. YARN doesn't even know which are which yet, and without any kind of container lifespan prediction there's no way to know whether a container will finish within the decomm timeout window or not. YARN knowing which apps are LRS is a primitive form of container lifespan prediction (i.e.: LRS = containers run forever). We will have the same problems with apps that aren't LRS but have containers that can run for a "long" time, where "long" is larger than the decomm timeout. That's why I'm not convinced it makes sense to do anything special for LRS apps vs. other apps. In the short-term I think we just go with a configurable decomm timeout and AM notification via strict preemption as the timeout expires. If we want to get a bit fancier, we can annotate the strict preemption with a timeout so the AM knows approximately _when_ the preemption will occur. With that feature we would notify AMs as soon as the node is marked for decomm that their containers will be forcibly preempted (i.e.: killed) in X minutes, and it's up to each AM to decide whether to do anything about it or if their containers on that node will complete within that time naturally. With that setup we don't have to special-case LRS apps or anything like that, as we're telling the apps ASAP the decomm is happening and giving them time to deal with it, LRS or not. > Support graceful decommission of nodemanager > -------------------------------------------- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.0.4-alpha > Reporter: Luke Lu > Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)