[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289286#comment-14289286 ]
Jason Lowe commented on YARN-914: --------------------------------- bq. The first step I was thinking to keep NM running in a low resource mode after graceful decommissioned I think it could be useful to leave the NM process up after the graceful decommission completes. That allows automated decommissioning tools to know the process completed by querying the NM directly. If the NM exits then the tool may have difficulty distinguishing between the NM crashing just before decommisioning completed vs. successful completion. The RM will be tracking this state as well, so it may not be critical to do it one way or the other if the tool is querying the RM rather than the NM directly. bq. However, I am not sure if they can handle state migration to new node ahead of predictable node lost here, or be stateless more or less make more sense here? I agree with Ming that it would be nice if the graceful decommission process could give the AMs a "heads up" about what's going on. The simplest way to accomplish that is to leverage the already existing preemption framework to tell the AM that YARN is about to take the resources away. The StrictPreemptionContract portion of the PreemptionMessage can be used to list exact resources that YARN will be reclaiming and give the AM a chance to react to that before the containers are reclaimed. It's then up to the AM if it wants to do anything special or just let the containers get killed after a timeout. bq. These notification may still be necessary, so AM won't add these nodes into blacklist if container get killed afterwards. Thoughts? I thought we could leverage the updated nodes list of the AllocateResponse to let AMs know when nodes are entering the decommissioning state or at least when the decommission state completes (and containers are killed). Although if the AM adds the node to the blacklist, that's not such a bad thing either since the RM should never allocate new containers on a decommissioning node anyway. > Support graceful decommission of nodemanager > -------------------------------------------- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.0.4-alpha > Reporter: Luke Lu > Assignee: Junping Du > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)