[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264106#comment-17264106 ] Benjamin Teke commented on YARN-10496: -- [~pbacsko], regarding the max capacity: as of now YARN-10504 disabled the validation for the absolute and absolute max capacity of a queue. I think we should allow some flexibility by either introducing a flag or a special format like you mentioned. Couple of concerns/questions: * Should we allow the max capacity to be lower than the capacity? ** In "relative to the cluster" mode this can be straightforward, especially with weight mode, I can setup a quite large queue hierarchy with weights and not worry about any queue eating up large part of the cluster resources. ** In "relative to the parent" mode this can allow an option where the weights are basically disabled, and the queues are configured with the max capacity. Not necessarily a problem, but this can lead to hard to read configurations. * If we keep/reintroduce the capacity < max capacity constraint in weight mode the user might have to calculate the percentages from weight manually. For example child1 and child2 are the only child queues under a parent with weights 3 and 1. In this setup child1 has to have the configured max capacity as 75% while child2 can have anything above 25%. This is ok for a static parent, but if/when auto-create templates/wildcard configs will be supported the capacity can greatly change based on the number of dynamic queues. If I want to express the max capacity of any child as 33% of the parent's resources I will need to define at least 3 static queues with the same weight, I can't allow these to be auto created (because 1 queue with weight 1 will have the capacity 100%, 2 queues with weight 1 will have 50%). This is another reason to let this constraint go. > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243149#comment-17243149 ] Peter Bacsko commented on YARN-10496: - _"From the design doc, one proposal is to define max capacity for weighted queues in terms of percentage of the cluster rather than percentage of the immediate parent. I would oppose this since max capacity in CS has always been in relative to the immediate parent."_ This is a valid concern. Unfortunately for us, in Fair Scheduler, the percentages are relative to the overall cluster capacity. We were not entirely sure about this when we put together the material, but I examined the FS part a bit deeply and these classes are the main point of interest: * org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.allocation.AllocationFileQueueParser.loadQueue() - [https://github.com/apache/hadoop/blob/fd6be5898ad1a650e3bceacb8169a53520da57e5/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/allocation/AllocationFileQueueParser.java#L141-L145] * org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.ConfigurableResource - [https://github.com/apache/hadoop/blob/5cc7873a4723a6c8e8e001d008fcd522eec0433d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ConfigurableResource.java#L44-L47] * ConfigurableResource.getResource() - [https://github.com/apache/hadoop/blob/5cc7873a4723a6c8e8e001d008fcd522eec0433d/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/ConfigurableResource.java#L79-L95] So when we calculate the maximum resource for a queue, it's expressed in the percentage of the overall cluster capacity. I think we can do multiple things: * We just ignore this and go the CS-way, meaning that the calculation will be based on the parent. This will be inconvenient for legacy FS users. * Add a feature flag to indicate which calculation you want. * Add a new format to "maximum-capacity". Like "c:50%" or "cluster:[50%]" to indicate what the percentage refers to. Thoughts? > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242887#comment-17242887 ] zhuqi commented on YARN-10496: -- Thanks [~wangda] for putting this proposal. As an old user of FS, i think option #1 would be the way to go, i agree with [~epayne] said. We should discuss how to define max capacity in CS, in FS max is used by absolute resource in regular. If we can restrict the max capacity to two choices(or three): 1: Use absolute resources. 2: Use percentage of the immediate parent. 3(optional): Use percentage of the cluster. This will help FS user to migrate, also CS users will be adapted to it. Thanks a lot. > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242686#comment-17242686 ] Eric Payne commented on YARN-10496: --- Thanks [~wangda] for putting this proposal together. I have a couple of comments. First, I think option #1 would be the way to go. With option #1, it's clear whether you want percentages or weights, but with option #2, you lose the ability to check whether or not the percentages add up to 100%. For people coming from a FS perspective, this may not seem like a loss, but for admins used to CS, it is important for the CS bringup to check if you misconfigured the properites. Also, with option #1, my guess is that the code will be more straightforward because once the weights are mapped to relative percentages, the calculations for user headroom, am limit, etc should remain the same. For design option #1, I have a couple of concerns: - From the design doc, one proposal is to define max capacity for weighted queues in terms of percentage of the cluster rather than percentage of the immediate parent. I would oppose this since max capacity in CS has always been in relative to the immediate parent. - Proposal #1 recommends to support a different percentage/weight/value for each resource type (memory/vcores/GPUs/etc.). I feel like that is a major change and could affect the way that the DRC works in the CS, so I feel that if we decide to implement that feature, we should separate it out into it's own design, and possibly even separate it from this effort. > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242282#comment-17242282 ] Peter Bacsko commented on YARN-10496: - My take: clearly I think it's between #1 and #2. Both require re-calculating the underlying percentages, but in case of #1 it's hidden from us because our fundamental unit is weight. However, current CS users might be perfectly happy with the renormalization to 100 so they might opt for #2. But since this issue came up during FS-to-CS migration, it's probably more important to FS users. So I'd vote for approach #1, introducing weight mode. > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17240870#comment-17240870 ] Benjamin Teke commented on YARN-10496: -- Added subtasks for approach#1. > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10496) [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235683#comment-17235683 ] Wangda Tan commented on YARN-10496: --- Worked with [~bteke] for a design doc, see the linked doc. Would like to see more comments from the community: cc: [~epayne], [~jhung], [~tangzhankun], [~bilwa_st] > [Umbrella] Support Flexible Auto Queue Creation in Capacity Scheduler > - > > Key: YARN-10496 > URL: https://issues.apache.org/jira/browse/YARN-10496 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler >Reporter: Wangda Tan >Priority: Major > > CapacityScheduler today doesn’t support an auto queue creation which is > flexible enough. The current constraints: > * Only leaf queues can be auto-created > * A parent can only have either static queues or dynamic ones. This causes > multiple constraints. For example: > * It isn’t possible to have a VIP user like Alice with a static queue > root.user.alice with 50% capacity while the other user queues (under > root.user) are created dynamically and they share the remaining 50% of > resources. > > * In comparison, FairScheduler allows the following scenarios, Capacity > Scheduler doesn’t: > ** This implies that there is no possibility to have both dynamically > created and static queues at the same time under root > * A new queue needs to be created under an existing parent, while the parent > already has static queues > * Nested queue mapping policy, like in the following example: > | > > | > * Here two levels of queues may need to be created > If an application belongs to user _alice_ (who has the primary_group of > _engineering_), the scheduler checks whether _root.engineering_ exists, if it > doesn’t, it’ll be created. Then scheduler checks whether > _root.engineering.alice_ exists, and creates it if it doesn't. > > When we try to move users from FairScheduler to CapacityScheduler, we face > feature gaps which blocks users migrate from FS to CS. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org