[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754101#comment-15754101 ]
Carlo Curino commented on YARN-5864: ------------------------------------ [~wangda] I like the direction of specifying more clearly what happens. I think working on a design doc that spells this out would be very valuable, I am happy to review and brainstorm with you if you think it is useful. (But FYI: I am on parental leave, and traveling abroad till mid-Jan.) In writing the document, in particular I think you should address the semantics from all points of view, e.g., which guarantees do I get as a user of any of the queues (not just the one we are preempting in favor of)? It is clear that if I am running over-capacity I can be preempted, but what happens if I am (safely?) within my capacity? (This is related to the "abuses" I was describing before, e.g., one in which I ask for massive containers on the nodes I want, and then resize them down, after you have killed anyone in my way). Looking further ahead: Ideally, this document you are starting to capture the semantics of this feature can be expanded to slowly cover all "tunables" of the scheduler, and explore the many complex interactions among features and the semantics we can derive from that (I bet we might be able to get rid of some redundancies). This could become part of the documentation of YARN. Even nicer would be to codify this with SLS driven tests (so that any future feature will not mess up with the semantics you are capturing, without us noticing). > Capacity Scheduler preemption for fragmented cluster > ----------------------------------------------------- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-5864.poc-0.patch > > > YARN-4390 added preemption for reserved container. However, we found one case > that large container cannot be allocated even if all queues are under their > limit. > For example, we have: > {code} > Two queues, a and b, capacity 50:50 > Two nodes: n1 and n2, each of them have 50 resource > Now queue-a uses 10 on n1 and 10 on n2 > queue-b asks for one single container with resource=45. > {code} > The container could be reserved on any of the host, but no preemption will > happen because all queues are under their limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org