[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2021-01-13 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264106#comment-17264106
 ] 

Benjamin Teke commented on YARN-10496:
--

[~pbacsko], regarding the max capacity: as of now YARN-10504 disabled the 
validation for the absolute and absolute max capacity of a queue. I think we 
should allow some flexibility by either introducing a flag or a special format 
like you mentioned. Couple of concerns/questions:
 * Should we allow the max capacity to be lower than the capacity?
 ** In "relative to the cluster" mode this can be straightforward, especially 
with weight mode, I can setup a quite large queue hierarchy with weights and 
not worry about any queue eating up large part of the cluster resources.
** In "relative to the parent" mode this can allow an option where the weights 
are basically disabled, and the queues are configured with the max capacity. 
Not necessarily a problem, but this can lead to hard to read configurations.
* If we keep/reintroduce the capacity < max capacity constraint in weight mode 
the user might have to calculate the percentages from weight manually. For 
example child1 and child2 are the only child queues under a parent with weights 
3 and 1. In this setup child1 has to have the configured max capacity as 75% 
while child2 can have anything above 25%. This is ok for a static parent, but 
if/when auto-create templates/wildcard configs will be supported the capacity 
can greatly change based on the number of dynamic queues. If I want to express 
the max capacity of any child as 33% of the parent's resources I will need to 
define at least 3 static queues with the same weight, I can't allow these to be 
auto created (because 1 queue with weight 1 will have the capacity 100%, 2 
queues with weight 1 will have 50%). This is another reason to let this 
constraint go.

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-12-03 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243149#comment-17243149
 ] 

Peter Bacsko commented on YARN-10496:
-

_"From the design doc, one proposal is to define max capacity for weighted 
queues in terms of percentage of the cluster rather than percentage of the 
immediate parent. I would oppose this since max capacity in CS has always been 
in relative to the immediate parent."_

This is a valid concern. Unfortunately for us, in Fair Scheduler, the 
percentages are relative to the overall cluster capacity. We were not entirely 
sure about this when we put together the material, but I examined the FS part a 
bit deeply and these classes are the main point of interest:
 * 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.allocation.AllocationFileQueueParser.loadQueue()
 - 
[https://github.com/apache/hadoop/blob/fd6be5898ad1a650e3bceacb8169a53520da57e5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L141-L145]
 * 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.ConfigurableResource
 - 
[https://github.com/apache/hadoop/blob/5cc7873a4723a6c8e8e001d008fcd522eec0433d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ConfigurableResource.java#L44-L47]
 * ConfigurableResource.getResource() - 
[https://github.com/apache/hadoop/blob/5cc7873a4723a6c8e8e001d008fcd522eec0433d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ConfigurableResource.java#L79-L95]

So when we calculate the maximum resource for a queue, it's expressed in the 
percentage of the overall cluster capacity.

I think we can do multiple things:
 * We just ignore this and go the CS-way, meaning that the calculation will be 
based on the parent. This will be inconvenient for legacy FS users.
 * Add a feature flag to indicate which calculation you want.
 * Add a new format to "maximum-capacity". Like "c:50%" or "cluster:[50%]" to 
indicate what the percentage refers to.

Thoughts?

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-12-02 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242887#comment-17242887
 ] 

zhuqi commented on YARN-10496:
--

Thanks [~wangda] for putting this proposal.

As an old user of FS, i think option #1 would be the way to go, i agree with 
[~epayne] said.

We should discuss how to define max capacity in CS, in FS max is used by 
absolute resource in regular. If we can restrict the max capacity to two 
choices(or three):

1: Use absolute resources.

2: Use percentage of the immediate parent.

3(optional): Use percentage of the cluster.

This will help FS user to migrate, also CS users will be adapted to it.

Thanks a lot.

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-12-02 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242686#comment-17242686
 ] 

Eric Payne commented on YARN-10496:
---

Thanks [~wangda] for putting this proposal together. I have a couple of 
comments.

First, I think option #1 would be the way to go. With option #1, it's clear 
whether you want percentages or weights, but with option #2, you lose the 
ability to check whether or not the percentages add up to 100%. For people 
coming from a FS perspective, this may not seem like a loss, but for admins 
used to CS, it is important for the CS bringup to check if you misconfigured 
the properites.
Also, with option #1, my guess is that the code will be more straightforward 
because once the weights are mapped to relative percentages, the calculations 
for user headroom, am limit, etc should remain the same.

For design option #1, I have a couple of concerns:
- From the design doc, one proposal is to define max capacity for weighted 
queues in terms of percentage of the cluster rather than percentage of the 
immediate parent. I would oppose this since max capacity in CS has always been 
in relative to the immediate parent.
- Proposal #1 recommends to support a different percentage/weight/value for 
each resource type (memory/vcores/GPUs/etc.). I feel like that is a major 
change and could affect the way that the DRC works in the CS, so I feel that if 
we decide to implement that feature, we should separate it out into it's own 
design, and possibly even separate it from this effort.


> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-12-02 Thread Peter Bacsko (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242282#comment-17242282
 ] 

Peter Bacsko commented on YARN-10496:
-

My take: clearly I think it's between #1 and #2.

Both require re-calculating the underlying percentages, but in case of #1 it's 
hidden from us because our fundamental unit is weight. However, current CS 
users might be perfectly happy with the renormalization to 100 so they might 
opt for #2.

But since this issue came up during FS-to-CS migration, it's probably more 
important to FS users. So I'd vote for approach #1, introducing weight mode.






> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-11-30 Thread Benjamin Teke (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240870#comment-17240870
 ] 

Benjamin Teke commented on YARN-10496:
--

Added subtasks for approach#1.

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler

2020-11-19 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235683#comment-17235683
 ] 

Wangda Tan commented on YARN-10496:
---

Worked with [~bteke] for a design doc, see the linked doc. Would like to see 
more comments from the community:

cc: [~epayne], [~jhung], [~tangzhankun], [~bilwa_st]

> [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
> -
>
> Key: YARN-10496
> URL: https://issues.apache.org/jira/browse/YARN-10496
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: capacity scheduler
>Reporter: Wangda Tan
>Priority: Major
>
> CapacityScheduler today doesn’t support an auto queue creation which is 
> flexible enough. The current constraints: 
>  * Only leaf queues can be auto-created
>  * A parent can only have either static queues or dynamic ones. This causes 
> multiple constraints. For example:
>  * It isn’t possible to have a VIP user like Alice with a static queue 
> root.user.alice with 50% capacity while the other user queues (under 
> root.user) are created dynamically and they share the remaining 50% of 
> resources.
>  
>  * In comparison, FairScheduler allows the following scenarios, Capacity 
> Scheduler doesn’t:
>  ** This implies that there is no possibility to have both dynamically 
> created and static queues at the same time under root
>  * A new queue needs to be created under an existing parent, while the parent 
> already has static queues
>  * Nested queue mapping policy, like in the following example: 
> |
> 
> |
>  * Here two levels of queues may need to be created 
> If an application belongs to user _alice_ (who has the primary_group of 
> _engineering_), the scheduler checks whether _root.engineering_ exists, if it 
> doesn’t,  it’ll be created. Then scheduler checks whether 
> _root.engineering.alice_ exists, and creates it if it doesn't.
>  
> When we try to move users from FairScheduler to CapacityScheduler, we face 
> feature gaps which blocks users migrate from FS to CS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org