[ 
https://issues.apache.org/jira/browse/YARN-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632571#comment-13632571
 ] 

Arun C Murthy commented on YARN-45:
-----------------------------------

Sorry, I've been away for a couple of weeks due to family reasons and I'm just 
catching up.

The bare-minimum requirement seems:
# RM should notify the AM that a certain amount of resources will need to be 
reclaimed (ala SIGTERM).
# Thus, the AM gets an opportunity to *pick* which containers it will sacrifice 
to satisfy the RM's requirements.
# Iff the AM doesn't act, the RM will go ahead and terminate some containers 
(probably the most-recently allocated ones); ala SIGKILL.

Given the above, I feel that this is a set of changes we need to be 
conservative about - particularly since the really simple pre-emption i.e. 
SIGKILL alone on RM side is trivial (from an API perspective).

Thus, I'm concerned about jumping into a complex preemption API 
(ResourceRequest etc.) without having sufficient experience i.e. doing this in 
the first iteration itself.

I like [~tucu00]'s initial suggestion of: 
# Resource resourcesToReclaim
# Optionally, a Set<ContainerId> which the RM will preempt i.e. SIGKILL 

In fact, for the first iteration, Set<ContainerId> is something we can avoid if 
the semantics are clear i.e. RM will preempt the most-recently allocated 
containers.

Once we have sufficient experience with this, we can then dive deeper to think 
about further enhancements to the API by adding features (in a compatible 
manner for 2.x or 3.x).

Thoughts? 
                
> Scheduler feedback to AM to release containers
> ----------------------------------------------
>
>                 Key: YARN-45
>                 URL: https://issues.apache.org/jira/browse/YARN-45
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Chris Douglas
>            Assignee: Carlo Curino
>         Attachments: YARN-45.patch, YARN-45.patch
>
>
> The ResourceManager strikes a balance between cluster utilization and strict 
> enforcement of resource invariants in the cluster. Individual allocations of 
> containers must be reclaimed- or reserved- to restore the global invariants 
> when cluster load shifts. In some cases, the ApplicationMaster can respond to 
> fluctuations in resource availability without losing the work already 
> completed by that task (MAPREDUCE-4584). Supplying it with this information 
> would be helpful for overall cluster utilization [1]. To this end, we want to 
> establish a protocol for the RM to ask the AM to release containers.
> [1] http://research.yahoo.com/files/yl-2012-003.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to