[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324390#comment-14324390 ]
Junping Du commented on YARN-914: --------------------------------- bq. The main point I'm trying to make here is that we shouldn't be worrying too much about long-running services right now. Agree. Especially we were pushing the tracking of timeout out of YARN core in above discussion. The new CLI will track time (configurable per operation) and send force decommission after timeout. We can add notification to AM on NM's decommissioning (and timeout) also which could be more complicated though. bq. In the short-term I think we just go with a configurable decomm timeout and AM notification via strict preemption as the timeout expires. If we want to get a bit fancier, we can annotate the strict preemption with a timeout so the AM knows approximately when the preemption will occur. Ok. My understanding here is we have two steps here: 1. notify AM in strict preemption after timeout; 2. notify AM in flexible preemption with tolerant timeout when start decommissioning. Quick question here is: what's the benefit of step 1 over decommission nodes directly after timeout? If there is benefit, why we don't do this today when decommission nodes? bq. With that feature we would notify AMs as soon as the node is marked for decomm that their containers will be forcibly preempted (i.e.: killed) in X minutes, and it's up to each AM to decide whether to do anything about it or if their containers on that node will complete within that time naturally. With that setup we don't have to special-case LRS apps or anything like that, as we're telling the apps ASAP the decomm is happening and giving them time to deal with it, LRS or not. Make sense. Sounds like there is a sub JIRA already being created, and we can extend it to have a timeout. > Support graceful decommission of nodemanager > -------------------------------------------- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Affects Versions: 2.0.4-alpha > Reporter: Luke Lu > Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)