[ https://issues.apache.org/jira/browse/YARN-4726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220217#comment-15220217 ]
Wangda Tan commented on YARN-4726: ---------------------------------- [~asuresh], Thanks for raising these JIRAs, they are required by a couple of scheduling improvements. Before starting implementation, could you add a design doc so we can understand better about scopes? > [Umbrella] Allocation reuse for application upgrades > ---------------------------------------------------- > > Key: YARN-4726 > URL: https://issues.apache.org/jira/browse/YARN-4726 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Vinod Kumar Vavilapalli > > See overview doc at YARN-4692, copying the sub-section to track all related > efforts. > Once auto-restart of containers is taken care of (YARN-4725), we need to > address what I believe is the second most important reason for service > containers to restart : upgrades. Once a service is running on YARN, the way > container allocation-lifecycle works, any time the container exits, YARN > will reclaim the resources. During an upgrade, with multitude of other > applications running in the system, giving up and getting back resources > allocated to the service is hard to manage. Things like NodeLabels in YARN > help this cause but are not straightforward to use to address the > app-specific usecases. > We need a first class way of letting application reuse the same > resourceallocation for multiple launches of the processes inside the > container. This is done by decoupling allocation lifecycle and the process > lifecycle. > The JIRA YARN-1040 initiated this conversation. We need two things here: > - (1) (Task) the ApplicationMaster should be able to use the same > container-allocation and issue multiple startContainerrequests to the > NodeManager. > - (2) [(Task) To support the upgrade of the ApplicationMaster itself, > clients should be able to inform YARN to restart AM within the same > allocation but with new bits. > The JIRAs YARN-3417 and YARN-4470 talk about the second task above ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)