[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243149#comment-17243149
 ] 

Peter Bacsko commented on YARN-10496:
-------------------------------------

_"From the design doc, one proposal is to define max capacity for weighted 
queues in terms of percentage of the cluster rather than percentage of the 
immediate parent. I would oppose this since max capacity in CS has always been 
in relative to the immediate parent."_

This is a valid concern. Unfortunately for us, in Fair Scheduler, the 
percentages are relative to the overall cluster capacity. We were not entirely 
sure about this when we put together the material, but I examined the FS part a 
bit deeply and these classes are the main point of interest:
 * 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.allocation.AllocationFileQueueParser.loadQueue()
 - 
[https://github.com/apache/hadoop/blob/fd6be5898ad1a650e3bceacb8169a53520da57e5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L141-L145]
 * 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.ConfigurableResource
 - 
[https://github.com/apache/hadoop/blob/5cc7873a4723a6c8e8e001d008fcd522eec0433d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ConfigurableResource.java#L44-L47]
 * ConfigurableResource.getResource() - 
[https://github.com/apache/hadoop/blob/5cc7873a4723a6c8e8e001d008fcd522eec0433d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ConfigurableResource.java#L79-L95]

So when we calculate the maximum resource for a queue, it's expressed in the 
percentage of the overall cluster capacity.

I think we can do multiple things:
 * We just ignore this and go the CS-way, meaning that the calculation will be 
based on the parent. This will be inconvenient for legacy FS users.
 * Add a feature flag to indicate which calculation you want.
 * Add a new format to "maximum-capacity". Like "c:50%" or "cluster:[50%]" to 
indicate what the percentage refers to.

Thoughts?

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> ---------------------------------------------------------------------
>
>                 Key: YARN-10496
>                 URL: https://issues.apache.org/jira/browse/YARN-10496
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |<rule name="nestedUserQueue" create=”true”>
>         <rule name="primaryGroup" create="true" />
> </rule>|
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to