[ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655514#comment-15655514
 ] 

Tan, Wangda commented on YARN-5864:
-----------------------------------

Thanks [~curino] for sharing these insightful suggestions.

The problem you mentioned is totally true: we were putting lots of efforts to 
add features for various of resource constraints (such as limits, node 
partition, priority, etc.) but we paid less attention about how to make 
easier/consistent semantics.

I also agree that we do need to spend some time to think about what is the 
semantics that YARN scheduler should have. For example, the minimum guarantee 
of CS is queue should get at least their configured capacity, but a picky app 
could make an under-utilized queue waiting forever for the resource. And also 
as you mentioned above, non-preemptable queue can invalidate configured 
capacity as well.

However, I would argue that the scheduler is not able to run perfectly without 
invalidating all the constraints. It is not just a group of formulas we need to 
define and let the solver to optimize it, it involves lots of human's emotions 
and preferences. For example, user may not understand and glad to accept why a 
picky request cannot be allocated even if the queue/cluster have available 
capacity. And it may not be acceptable to a production cluster that a long 
running service for realtime queries cannot be launched because we don't want 
to kill some less-important batch jobs. My point is, if we can have these rules 
defined in the doc and user can know what happened from the UI/log, we can add 
them.

To improve these, I think your suggestion (1) will be more helpful and 
achievable in a short term, we can definitely remove some parameters, for 
example, existing user-limit definition is not good enough and 
user-limit-factor can always make a queue cannot fully utilize its capacity. 
And we can better define these semantics in doc and UI.

(2) Looks beautiful but it may not be able to solve the root problem directly: 
The first priority is to make our users feel happy to accept it instead of 
beautifully solving it in mathematics. For example, for the problem I put in 
description of the JIRA, I don't think (2) can get allocation without harming 
other applications. And in implementation's perspective, I'm not sure how to 
make a solver-based solution can handle both of fast allocation (we want to do 
allocation within milli-seconds for interactive queries) and good placement 
(such as gang scheduling with some other constraints like anti-affinity). It 
seems to me that we will sacrifice low latency to get better quality of 
placement for the option (2).

bq. This opens up many abuses, one that comes to mind ...
Actually this feature will be only used in a pretty controlled environment: 
Important long running services running in a separate queue, and admin/user 
agrees that it can preempt other batch jobs to get new containers. ACLs will be 
set to avoid normal user running inside these queues, all apps running in the 
queue should be trusted apps such as YARN native services (Slider), Spark, etc. 
And we can also make sure these apps will try best to respect other apps.
And please advice if you think we can improve the semantics of this feature.

Thanks,

> Capacity Scheduler preemption for fragmented cluster 
> -----------------------------------------------------
>
>                 Key: YARN-5864
>                 URL: https://issues.apache.org/jira/browse/YARN-5864
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-5864.poc-0.patch
>
>
> YARN-4390 added preemption for reserved container. However, we found one case 
> that large container cannot be allocated even if all queues are under their 
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50 
> Two nodes: n1 and n2, each of them have 50 resource 
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45. 
> {code} 
> The container could be reserved on any of the host, but no preemption will 
> happen because all queues are under their limits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to