[ https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632571#comment-13632571 ]
Arun C Murthy commented on YARN-45: ----------------------------------- Sorry, I've been away for a couple of weeks due to family reasons and I'm just catching up. The bare-minimum requirement seems: # RM should notify the AM that a certain amount of resources will need to be reclaimed (ala SIGTERM). # Thus, the AM gets an opportunity to *pick* which containers it will sacrifice to satisfy the RM's requirements. # Iff the AM doesn't act, the RM will go ahead and terminate some containers (probably the most-recently allocated ones); ala SIGKILL. Given the above, I feel that this is a set of changes we need to be conservative about - particularly since the really simple pre-emption i.e. SIGKILL alone on RM side is trivial (from an API perspective). Thus, I'm concerned about jumping into a complex preemption API (ResourceRequest etc.) without having sufficient experience i.e. doing this in the first iteration itself. I like [~tucu00]'s initial suggestion of: # Resource resourcesToReclaim # Optionally, a Set<ContainerId> which the RM will preempt i.e. SIGKILL In fact, for the first iteration, Set<ContainerId> is something we can avoid if the semantics are clear i.e. RM will preempt the most-recently allocated containers. Once we have sufficient experience with this, we can then dive deeper to think about further enhancements to the API by adding features (in a compatible manner for 2.x or 3.x). Thoughts? > Scheduler feedback to AM to release containers > ---------------------------------------------- > > Key: YARN-45 > URL: https://issues.apache.org/jira/browse/YARN-45 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Chris Douglas > Assignee: Carlo Curino > Attachments: YARN-45.patch, YARN-45.patch > > > The ResourceManager strikes a balance between cluster utilization and strict > enforcement of resource invariants in the cluster. Individual allocations of > containers must be reclaimed- or reserved- to restore the global invariants > when cluster load shifts. In some cases, the ApplicationMaster can respond to > fluctuations in resource availability without losing the work already > completed by that task (MAPREDUCE-4584). Supplying it with this information > would be helpful for overall cluster utilization [1]. To this end, we want to > establish a protocol for the RM to ask the AM to release containers. > [1] http://research.yahoo.com/files/yl-2012-003.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira