[ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754101#comment-15754101
 ] 

Carlo Curino commented on YARN-5864:
------------------------------------

[~wangda] I like the direction of specifying more clearly what happens. I think 
working on a design doc that spells this out would be very valuable, I am happy 
to review and brainstorm with you if you think it is useful. (But FYI: I am on 
parental leave, and traveling abroad till mid-Jan.)

In writing the document, in particular I think you should address the semantics 
from all points of view, e.g., which guarantees do I get as a user of any of 
the queues (not just the one we are preempting in favor of)? It is clear that 
if I am running over-capacity I can be preempted, but what happens if I am 
(safely?) within my capacity? (This is related to the "abuses" I was describing 
before, e.g., one in which I ask for massive containers on the nodes I want, 
and then resize them down, after you have killed anyone in my way).  

Looking further ahead: Ideally, this document you are starting to capture the 
semantics of this feature can be expanded to slowly cover all "tunables" of the 
scheduler, and explore the many complex interactions among features and the 
semantics we can derive from that (I bet we might be able to get rid of some 
redundancies). This could become part of the documentation of YARN. Even nicer 
would be to codify this with SLS driven tests (so that any future feature will 
not mess up with the semantics you are capturing, without us noticing).

> Capacity Scheduler preemption for fragmented cluster 
> -----------------------------------------------------
>
>                 Key: YARN-5864
>                 URL: https://issues.apache.org/jira/browse/YARN-5864
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-5864.poc-0.patch
>
>
> YARN-4390 added preemption for reserved container. However, we found one case 
> that large container cannot be allocated even if all queues are under their 
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50 
> Two nodes: n1 and n2, each of them have 50 resource 
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45. 
> {code} 
> The container could be reserved on any of the host, but no preemption will 
> happen because all queues are under their limits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to