[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135138#comment-15135138 ]
Daniel Zhi commented on YARN-914: --------------------------------- I have applied and merged my code changes on top of latest Hadoop trunk branch (3.0.0-SNAPSHOT), launched cluster and verified graceful decommission works as expected. Per suggestion, I created a sub-JIRA with a doc that describes the design and the patch on top of latest trunk. > (Umbrella) Support graceful decommission of nodemanager > ------------------------------------------------------- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful > Affects Versions: 2.0.4-alpha > Reporter: Luke Lu > Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)