[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266081#comment-17266081 ] Gergely Pollak edited comment on YARN-10506 at 1/15/21, 2:46 PM: - Thank you [~gandras] for the fix, here are a few comments, most of them can wait for a followup JIRA: *CapacitySchedulerAutoQueueHandler class*: Why is it a separate class, this is the functionality of the actual queue, I think it would have been better to extend the parentQueue and add this functionality there, this is a class which automagicly changes the ParentQueue class and enhances it with new functionality, but I think it would be better to solve this via inheritance. Also that would make differentiating between ParentQueue and the DynamicParentQueue much easier. Probably we should fix it in a followup JIRA. *CapacitySchedulerAutoQueueHandler#getQueue* This method is unnecessary, since CSQueueStore.get method handles null paths, and returns null. See CapacitySchedulerQueueManager#getQueue and CSQueueStore#get *CapacitySchedulerAutoQueueHandler#autoCreateQueue* I might be wrong, but this is concerning: {code:java} CSQueue existingQueueCandidate = getQueue(queueCandidateContext.getQueue());{code} The getQueue will always return the leaf name of the queue, which might be ambiguous so this might return null even if the queue exists on the path we are checking, and might identify an already existing queue invalid. As a rule of thumb, wherever possible use the full path of the queues please! If we have the following structure: {code:java} root +--alice | +--test | +--dev +--bob | +--test | +--dev {code} Where both _dev_ and _test_ queues are dynamic parents, then when I try to submit something to root.bob.dev.something, this part of the code {code:java} ApplicationPlacementContext queueCandidateContext = parentContext; // (Parent context is root.bob.dev) CSQueue existingQueueCandidate = getQueue(queueCandidateContext.getQueue()); //queueCandidateContext.getQueue() = 'dev' getQueue(queueCandidateContext.getQueue()) == null //because dev is ambiguous {code} Will identify root.bob.dev as a not existing queue. So please use _ApplciationPlacementContext#getFullQueuePath_ *CapacitySchedulerAutoQueueHandler#isDynamicQueue* Is not fool/future proof, while I'm aware currently only the AbstractCSQueue class implements the interface, but still it would be better to check before casting it to AbstractCSQueue. Or move the isDynamic down to the interface, then you don't have to cast was (Author: shuzirra): Thank you [~gandras] for the fix, here are a few comments, most of them can wait for a followup JIRA: *CapacitySchedulerAutoQueueHandler class*: Why is it a separate class, this is the functionality of the actual queue, I think it would have been better to extend the parentQueue and add this functionality there, this is a class which automagicly changes the ParentQueue class and enhances it with new functionality, but I think it would be better to solve this via inheritance. Also that would make differentiating between ParentQueue and the DynamicParentQueue much easier. Probably we should fix it in a followup JIRA. *CapacitySchedulerAutoQueueHandler#getQueue* This method is unnecessary, since CSQueueStore.get method handles null paths, and returns null. See CapacitySchedulerQueueManager#getQueue and CSQueueStore#get *CapacitySchedulerAutoQueueHandler#autoCreateQueue* I might be wrong, but this is concerning: {code:java} CSQueue existingQueueCandidate = getQueue(queueCandidateContext.getQueue());{code} The getQueue will always return the leaf name of the queue, which might be ambiguous so this might return null even if the queue exists on the path we are checking, and might identify an already existing queue invalid. As a rule of thumb, wherever possible use the full path of the queues please! If we have the following structure: {code:java} root +--alice | +--test | +--dev +--bob | +--test | +--dev {code} Where both _dev_ and _test_ queues are dynamic parents, then when I try to submit something to root.bob.dev.something, this part of the code {code:java} ApplicationPlacementContext queueCandidateContext = parentContext; // (Parent context is root.bob.dev) CSQueue existingQueueCandidate = getQueue(queueCandidateContext.getQueue()); //queueCandidateContext.getQueue() = 'dev' getQueue(queueCandidateContext.getQueue()) == null //because dev is ambiguous {code} Will identify root.bob.dev as a not existing queue. So please use _ApplciationPlacementContext#getFullQueuePath_ *CapacitySchedulerAutoQueueHandler#isDynamicQueue* Is not fool/future proof, while I'm aware currently only the AbstractCSQueue class implements the interface, but still it would be bett
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265651#comment-17265651 ] zhuqi edited comment on YARN-10506 at 1/15/21, 2:56 AM: I update this and fix the check style in 013 patch. Waiting for the CI. Thanks all for patient review. was (Author: zhuqi): I update this and fix the check style in 013 patch. Thanks all for patient review. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506-012.patch, YARN-10506-013.patch, YARN-10506.001.patch, > YARN-10506.002.patch, YARN-10506.003.patch, YARN-10506.004.patch, > YARN-10506.005.patch, YARN-10506.006-combined.patch, YARN-10506.006.patch, > YARN-10506.007.patch, YARN-10506.009.patch, YARN-10506.011.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17265208#comment-17265208 ] Gergely Pollak edited comment on YARN-10506 at 1/14/21, 8:29 PM: - Thank you [~wangda] [~pbacsko] for your insights! Actually the queue creation checks were present in the legacy engine ([https://github.com/apache/hadoop/blob/9d5144e66d71551ad6e1a8d3f1670e49ba181999/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java#L333-L340]), if I see correctly since 3.0. But the first commit seems much older: [https://github.com/apache/hadoop/commit/0987a7b8c2c1e4c2095821d98a7db19644df] . I've just kept it during the refactor, and as Peter pointed out now we actually build on it, since we need to be able to determine if a rule will be executed successfully (ie. the queue can be created), and if it cannot we can move onto the next rule. This fallback behaviour is part of the FS Placement engine as well, so we need it in order to server FS customers, so it was somewhat lucky that it was already in place. (In the original implementation we just used this check to throw exceptions, which was quite pointless, since the CS would have rejected the application if the queue was uncreatable anyway) So this is why I say we can safely remove those flags from the application placement context, this way we can reduce the code complexity. This way we only need what [~pbacsko] mentioned: {code:java} 1) "create" flag on the rule itself in the JSON - true/false 2) "yarh.scheduler.capacity..auto-queue-creation-v2.enabled" - true/false{code} Also I'm currently adopting this change in the CSMappingPlacementRule, and I don't use these flags. What I'll need in the future a better, centralised way to determine if a path can be created, and both CS and the MappingEngine should use the same methods, to do it. Currently we are implementing the same logic differently, and it's much harder to maintain the code this way, but during the followup code cleanup jiras we'll take care of this. About the race condition, I think (and I might be very wrong, because race conditions can be really sneaky), we cannot do much about, theoretically it can happen that the Mapping engine determines a rule can be executed, because the target queue exists or there is a proper dynamic parent, then a config change occurs, and by the time we get to the actual creation the queue structure changes, at this point the rejection is the proper action to take since the submission IS against the configuration. And I don't see a way around it since at the time the MappingEngine makes the decision that decision is the correct one, but we shouldn't enforce this decision if the configuration changes. But this race condition should only occur when the user changes the queue configuration, and in this case they should expect failing application submissions for the queues which have been changed or removed. was (Author: shuzirra): Actually the queue creation checks were present in the legacy engine (https://github.com/apache/hadoop/blob/9d5144e66d71551ad6e1a8d3f1670e49ba181999/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/placement/UserGroupMappingPlacementRule.java#L333-L340), if I see correctly since 3.0. But the first commit seems much older: [https://github.com/apache/hadoop/commit/0987a7b8c2c1e4c2095821d98a7db19644df] . I've just kept it during the refactor, and as Peter pointed out now we actually build on it, since we need to be able to determine if a rule will be executed successfully (ie. the queue can be created), and if it cannot we can move onto the next rule. This fallback behaviour is part of the FS Placement engine as well, so we need it in order to server FS customers, so it was somewhat lucky that it was already in place. (In the original implementation we just used this check to throw exceptions, which was quite pointless, since the CS would have rejected the application if the queue was uncreatable anyway) So this is why I say we can safely remove those flags from the application placement context, this way we can reduce the code complexity. This way we only need what [~pbacsko] mentioned: {code:java} 1) "create" flag on the rule itself in the JSON - true/false 2) "yarh.scheduler.capacity..auto-queue-creation-v2.enabled" - true/false{code} Also I'm currently adopting this change in the CSMappingPlacementRule, and I don't use these flags. What I'll need in the future a better, centralised way to determine if a path can be created, and both CS and the MappingEngine should use the same methods, to do it. Currently we are implementing
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264977#comment-17264977 ] Peter Bacsko edited comment on YARN-10506 at 1/14/21, 3:56 PM: --- *Clarification comment* It turned out I misunderstood [~shuzirra]'s comment above. I thought he was speaking about the "create" flag the comes from the rule itself. I should have read it more carefully. Anyway we just had an internal meeting so now I have a better understanding. _"How we deal with "create" flag of ApplicationPlacementContext?"_ I don't think we need _any_ flags in this class. We need two settings basically: 1) "create" flag on the rule itself in the JSON - true/false 2) "yarh.scheduler.capacity..auto-queue-creation-v2.enabled" - true/false The placement engine (even the old one) can access both to make a decision about whether the queue is allowed to be created or not. If it is, it returns the full path inside the "ApplicationPlacementContext". Otherwise it decides what to do by checking the "fallbackResult" setting of the rule: returns "root.default", proceeds to the next rule or rejects the application. In fact, Gergo told me that *the new engine already works the way I proposed*. We don't need any additional input or flag to make an explicit, valid decision about the placement. Also, [~gandras] + [~snemeth] reached the same conclusion. was (Author: pbacsko): *Clarification comment* I turned out I misunderstood [~shuzirra]'s comment above. I thought he was speaking about the "create" flag the comes from the rule itself. I should have read it more carefully. Anyway we just had an internal meeting so now I have a better understanding. _"How we deal with "create" flag of ApplicationPlacementContext?"_ I don't think we need _any_ flags in this class. We need two settings basically: 1) "create" flag on the rule itself in the JSON - true/false 2) "yarh.scheduler.capacity..auto-queue-creation-v2.enabled" - true/false The placement engine (even the old one) can access both to make a decision about whether the queue is allowed to be created or not. If it is, it returns the full path inside the "ApplicationPlacementContext". Otherwise it decides what to do by checking the "fallbackResult" setting of the rule: returns "root.default", proceeds to the next rule or rejects the application. In fact, Gergo told me that *the new engine already works the way I proposed*. We don't need any additional input or flag to make an explicit, valid decision about the placement. Also, [~gandras] + [~snemeth] reached the same conclusion. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506-012.patch, YARN-10506.001.patch, YARN-10506.002.patch, > YARN-10506.003.patch, YARN-10506.004.patch, YARN-10506.005.patch, > YARN-10506.006-combined.patch, YARN-10506.006.patch, YARN-10506.007.patch, > YARN-10506.009.patch, YARN-10506.011.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264855#comment-17264855 ] Gergely Pollak edited comment on YARN-10506 at 1/14/21, 3:46 PM: - Hi [~gandras] [~wangda], [~zhuqi] *1) How we deal with "create" flag of ApplicationPlacementContext?* ** I don't think we need this flag at all. The CapacityScheduler has the information where can be dynamic queues created, it is supplied via the configuration option {{queue-path.auto-queue-creation-v2.enabled}} and {\{queue-path.auto-queue-creation.enabled }} So at any point the CS will be aware if a supplied queue path is creatable. Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration. This can cause some undesired issues, and we really don't get anything with it. It will be a little more inconvenient for the user since they have to configure the mapping rules AND the CS queues properly, but it doesn't look like a big deal. On the other hand there are multiple places in the code which check if a queue can be created and all these places use the queue hierarchy to determine whether a queue is a managed parent, or not, however if the Mapping Rules are allowed to circumvent this, it might lead to unforeseen inconsistencies. (As a matter of fact the CSMappingPlacementRule currently uses the information from CS to determine if a queue can be created or not, so it wouldn't set any of these flags differently than CS is configured anyway) So I would say let's omit the creation flags from the ApoplicationPlacementContext, let the CSMappingPlacementRule rely on the information provided by the CS about queue creation eligibility, and let it pass back a "regular" ApplicationPlacementContext with the desired queue path, then the CS will be able to create it if the configuration allows. This is a much easier and stable solution, while we don't really put extra administration overhead to the user, and also can be easily extended in the future. As I see circumventing protections are never a good idea even if we are talking about internal interfaces, and these flags would exactly do that, they would force the CS to create queues which are against it's configuration. was (Author: shuzirra): Hi [~gandras] [~wangda], [~zhuqi] *1) How we deal with "create" flag of ApplicationPlacementContext?* ** I don't think we need this flag at all. The CapacityScheduler has the information where can be dynamic queues created, it is supplied via the configuration option {{queue-path.auto-queue-creation-v2.enabled}} and {\{queue-path.auto-queue-creation.enabled }} So at any point the CS will be aware if a supplied queue path is creatable. Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration. This can cause some undesired issues, and we really don't get anything with it. It will be a little more inconvenient for the user since they have to configure the mapping rules AND the CS queues properly, but it doesn't look like a big deal. On the other hand there are multiple places in the code which check if a queue can be created and all these places use the queue hierarchy to determine whether a queue is a managed parent, or not, however if the Mapping Rules are allowed to circumvent this, it might lead to unforeseen inconsistencies. (As a matter of fact the CSMappingPlacementRule currently uses the information from CS to determine if a queue can be created or not, so it wouldn't set any of these flags differently then CS is configured anyway) So I would say let's omit the creation flags from the ApoplicationPlacementContext, let the CSMappingPlacementRule rely on the information provided by the CS about queue creation eligibility, and let it pass back a "regular" ApplicationPlacementContext with the desired queue path, then the CS will be able to create it if the configuration allows. This is a much easier and stable solution, while we don't really put extra administration overhead to the user, and also can be easily extended in the future. As I see circumventing protections are never a good idea even if we are talking about internal interfaces, and these flags would exactly do that, they would force the CS to create queues which are against it's configuration. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264855#comment-17264855 ] Gergely Pollak edited comment on YARN-10506 at 1/14/21, 3:46 PM: - Hi [~gandras] [~wangda], [~zhuqi] *1) How we deal with "create" flag of ApplicationPlacementContext?* ** I don't think we need this flag at all. The CapacityScheduler has the information where can be dynamic queues created, it is supplied via the configuration option {{queue-path.auto-queue-creation-v2.enabled}} and {\{queue-path.auto-queue-creation.enabled }} So at any point the CS will be aware if a supplied queue path is creatable. Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration. This can cause some undesired issues, and we really don't get anything with it. It will be a little more inconvenient for the user since they have to configure the mapping rules AND the CS queues properly, but it doesn't look like a big deal. On the other hand there are multiple places in the code which check if a queue can be created and all these places use the queue hierarchy to determine whether a queue is a managed parent, or not, however if the Mapping Rules are allowed to circumvent this, it might lead to unforeseen inconsistencies. (As a matter of fact the CSMappingPlacementRule currently uses the information from CS to determine if a queue can be created or not, so it wouldn't set any of these flags differently than CS is configured anyway) So I would say let's omit the creation flags from the ApplicationPlacementContext, let the CSMappingPlacementRule rely on the information provided by the CS about queue creation eligibility, and let it pass back a "regular" ApplicationPlacementContext with the desired queue path, then the CS will be able to create it if the configuration allows. This is a much easier and stable solution, while we don't really put extra administration overhead to the user, and also can be easily extended in the future. As I see circumventing protections are never a good idea even if we are talking about internal interfaces, and these flags would exactly do that, they would force the CS to create queues which are against it's configuration. was (Author: shuzirra): Hi [~gandras] [~wangda], [~zhuqi] *1) How we deal with "create" flag of ApplicationPlacementContext?* ** I don't think we need this flag at all. The CapacityScheduler has the information where can be dynamic queues created, it is supplied via the configuration option {{queue-path.auto-queue-creation-v2.enabled}} and {\{queue-path.auto-queue-creation.enabled }} So at any point the CS will be aware if a supplied queue path is creatable. Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration. This can cause some undesired issues, and we really don't get anything with it. It will be a little more inconvenient for the user since they have to configure the mapping rules AND the CS queues properly, but it doesn't look like a big deal. On the other hand there are multiple places in the code which check if a queue can be created and all these places use the queue hierarchy to determine whether a queue is a managed parent, or not, however if the Mapping Rules are allowed to circumvent this, it might lead to unforeseen inconsistencies. (As a matter of fact the CSMappingPlacementRule currently uses the information from CS to determine if a queue can be created or not, so it wouldn't set any of these flags differently than CS is configured anyway) So I would say let's omit the creation flags from the ApoplicationPlacementContext, let the CSMappingPlacementRule rely on the information provided by the CS about queue creation eligibility, and let it pass back a "regular" ApplicationPlacementContext with the desired queue path, then the CS will be able to create it if the configuration allows. This is a much easier and stable solution, while we don't really put extra administration overhead to the user, and also can be easily extended in the future. As I see circumventing protections are never a good idea even if we are talking about internal interfaces, and these flags would exactly do that, they would force the CS to create queues which are against it's configuration. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264903#comment-17264903 ] Peter Bacsko edited comment on YARN-10506 at 1/14/21, 1:50 PM: --- Hi guys. I haven't really contributed to this patch and there's some big discussion going on here, but I'm adding my 2 cents. To give you some background: I introduced the "create" flag for the new mapping rules, simply because FS has the same - auto-creation depends on the rule and users who migrate from FS will find this intuitive, even if the rule format has changed drastically. Now, I was under the impression that with weight mode and AQC, we're trying to approximate FS behaviour as much as possible, because the whole "project" came up in the context on FS->CS migration. So I thought that in this mode, users are allowed to create queues anywhere in the hierarchy, eg. with {{}} rule, you can definitely do this in FS, there is no restriction, at least I'm not aware of it. The FS property "allow-undeclared-pools" is translated internally to {{}} rule. Again, the was the idea. On the other hand, it's also true that if we go this way, legacy percentage mode and weight mode will diverge significantly, so the behavior will be totally different. In pct/absolute mode, you can explicitly say where queues can be created, but if a user switches to weight mode, they cannot do this, which might be surprising for them (or maybe not - this is a pure speculation at this point). If we really want to limit users where they can create queues (as I can see this is clearly the goal here) then this leads us to the problem what Gergely just discussed above, have a "create" flag in the rule and have a similar property which restricts whether a queue can be created, this can lead to some unexpected behavior, especially if their priority is reversed. At the very least, legacy FS users will be confused like hell IMO. We also have to think about the following scenario: a user who just migrated from FS, still wants the exact FS auto-queue semantics and uses "allow-undeclared-pools" or a {{}} rule. So if that's the case, we have to be able to set this up easily. That is, if they have 200 queues and they have to set this property for _every single queue_, I can easily imagine that their head will explode. So we should consider having the property "auto-queue-creation-v2.enabled" as a setting which is automatically inherited by all child queues, recursively. Or choose the default value to be "true" which is probably the simplest thing to do. My conclusion is: 1) I think we have to keep the "create" flag on a rule level if we want FS rules to work the same way. Some rules might have create=true, some of them might not, so we can't rely on having the queue properties alone. 2) If we want to introduce "auto-queue-creation-v2.enabled", it's important to think about whether to make it inheritable or not. If it's not, then this can potentially cause a huge pain for some users. Alternative approach: have a separate property which applies to the whole hierarchy globally, which can be overridden on a per-queue basis, this is a very common approach in CS. 3) I very much agree with this statement: _"Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration"_. If we truly want both, then the queue property should have its final say about the creation. The rule says "let's create this queue if it doesn't exist", but then we must check "auto-queue-creation-v2.enabled" for the target path to see if the queue can be created at the given point in the hierarchy. It should not be the other way around. was (Author: pbacsko): Hi guys. I haven't really contributed to this patch and there's some big discussion going on here, but I'm adding my 2 cents. To give you some background: I introduced the "create" flag for the new mapping rules, simply because FS has the same - auto-creation depends on the rule and users who migrate from FS will find this intuitive, even if the rule format has changed drastically. Now, I was under the impression that with weight mode and AQC, we're trying to approximate FS behaviour as much as possible, because the whole "project" came up in the context on FS->CS migration. So I thought that in this mode, users are allowed to create queues anywhere in the hierarchy, eg. with {{}} rule, you can definitely do this in FS, there is no restriction, at least I'm not aware of it. The FS property "allow-undeclared-pools" is translated internally to {{}} rule. Again, the was the idea. On the other hand, it's also true that if we go this way, legacy percentage mode and weight mode will diverge significantly, so the behavior will be totally different. In pct/absolute mode, you can explicitly say where qu
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264855#comment-17264855 ] Gergely Pollak edited comment on YARN-10506 at 1/14/21, 12:15 PM: -- Hi [~gandras] [~wangda], [~zhuqi] *1) How we deal with "create" flag of ApplicationPlacementContext?* ** I don't think we need this flag at all. The CapacityScheduler has the information where can be dynamic queues created, it is supplied via the configuration option {{queue-path.auto-queue-creation-v2.enabled}} and {\{queue-path.auto-queue-creation.enabled }} So at any point the CS will be aware if a supplied queue path is creatable. Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration. This can cause some undesired issues, and we really don't get anything with it. It will be a little more inconvenient for the user since they have to configure the mapping rules AND the CS queues properly, but it doesn't look like a big deal. On the other hand there are multiple places in the code which check if a queue can be created and all these places use the queue hierarchy to determine whether a queue is a managed parent, or not, however if the Mapping Rules are allowed to circumvent this, it might lead to unforeseen inconsistencies. (As a matter of fact the CSMappingPlacementRule currently uses the information from CS to determine if a queue can be created or not, so it wouldn't set any of these flags differently then CS is configured anyway) So I would say let's omit the creation flags from the ApoplicationPlacementContext, let the CSMappingPlacementRule rely on the information provided by the CS about queue creation eligibility, and let it pass back a "regular" ApplicationPlacementContext with the desired queue path, then the CS will be able to create it if the configuration allows. This is a much easier and stable solution, while we don't really put extra administration overhead to the user, and also can be easily extended in the future. As I see circumventing protections are never a good idea even if we are talking about internal interfaces, and these flags would exactly do that, they would force the CS to create queues which are against it's configuration. was (Author: shuzirra): Hi [~gandras] [~wangda], *1) How we deal with "create" flag of ApplicationPlacementContext?* ** I don't think we need this flag at all. The CapacityScheduler has the information where can be dynamic queues created, it is supplied via the configuration option {{queue-path.auto-queue-creation-v2.enabled}} and {{queue-path.auto-queue-creation.enabled }} So at any point the CS will be aware if a supplied queue path is creatable. Mapping rules will return a path based on their configuration, but I don't think the placement rule should be overriding the CS configuration. This can cause some undesired issues, and we really don't get anything with it. It will be a little more inconvenient for the user since they have to configure the mapping rules AND the CS queues properly, but it doesn't look like a big deal. On the other hand there are multiple places in the code which check if a queue can be created and all these places use the queue hierarchy to determine whether a queue is a managed parent, or not, however if the Mapping Rules are allowed to circumvent this, it might lead to unforeseen inconsistencies. (As a matter of fact the CSMappingPlacementRule currently uses the information from CS to determine if a queue can be created or not, so it wouldn't set any of these flags differently then CS is configured anyway) So I would say let's omit the creation flags from the ApoplicationPlacementContext, let the CSMappingPlacementRule rely on the information provided by the CS about queue creation eligibility, and let it pass back a "regular" ApplicationPlacementContext with the desired queue path, then the CS will be able to create it if the configuration allows. This is a much easier and stable solution, while we don't really put extra administration overhead to the user, and also can be easily extended in the future. As I see circumventing protections are never a good idea even if we are talking about internal interfaces, and these flags would exactly do that, they would force the CS to create queues which are against it's configuration. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264683#comment-17264683 ] zhuqi edited comment on YARN-10506 at 1/14/21, 8:17 AM: Thanks for [~gandras] comment. When submit without enabled mapping rules(auto create leaf/parent ) , the logic will back to submit the normal leafqueue if the leafqueue already exists, i think this is reasonable. If there are other cases AQC will not used by mapping rules, we should use other function rather than autoCreateLeafQueue, and the APC flags is mainly used for mapping rules check. If you any other advice [~wangda] ? was (Author: zhuqi): [~gandras] When submit without enabled mapping rules(auto create leaf/parent ) , the logic will back to submit the normal leafqueue if the leafqueue already exists, i think this is reasonable. If there are other cases AQC will not used by mapping rules, we should use other function rather than autoCreateLeafQueue, and the APC flags is mainly used for mapping rules check. If you any other advice [~wangda] ? > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506-012.patch, YARN-10506.001.patch, YARN-10506.002.patch, > YARN-10506.003.patch, YARN-10506.004.patch, YARN-10506.005.patch, > YARN-10506.006-combined.patch, YARN-10506.006.patch, YARN-10506.007.patch, > YARN-10506.009.patch, YARN-10506.011.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264048#comment-17264048 ] zhuqi edited comment on YARN-10506 at 1/13/21, 10:33 AM: - [~gandras] Regarding 2.2 atomicity: The last instanceof Parent check should have never occurred, because then we have already made every necessary check. I have only included that step because I was reluctant to use a casting without an explicit check. I find it too, so just changed to the one class in update. Copy constructor is a good idea. about : Regarding CSQueueUtils#extractQueuePath. I think we should keep this here, because I have seen the same thing repeated over and over across the code base, so I will instead address this in the cleanup jira. I think this is the minor things, we my should deal with the major things first. was (Author: zhuqi): [~gandras] Regarding 2.2 atomicity: The last instanceof Parent check should have never occurred, because then we have already made every necessary check. I have only included that step because I was reluctant to use a casting without an explicit check. I find it too, so just change to the one class. Copy constructor is a good idea. about : Regarding CSQueueUtils#extractQueuePath. I think we should keep this here, because I have seen the same thing repeated over and over across the code base, so I will instead address this in the cleanup jira. I think this is the minor things, we my should deal with the major things first. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506.001.patch, YARN-10506.002.patch, YARN-10506.003.patch, > YARN-10506.004.patch, YARN-10506.005.patch, YARN-10506.006-combined.patch, > YARN-10506.006.patch, YARN-10506.007.patch, YARN-10506.009.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264036#comment-17264036 ] Andras Gyori edited comment on YARN-10506 at 1/13/21, 10:24 AM: Regarding CSQueueUtils#extractQueuePath. I think we should keep this here, because I have seen the same thing repeated over and over across the code base, so I will instead address this in the cleanup jira. Also, using clone method is not a good idea in my opinion. I think the findbug is complaining about not implementing the Cloneable interface, also the signature of the method is to throw a CloneNotSupportedException as well. If we do not use it anywhere else, I would suggest to use a copy constructor instead. was (Author: gandras): Regarding CSQueueUtils#extractQueuePath. I think we should keep this here, because I have seen the same thing repeated over and over across the code base, so I will instead address this in the cleanup jira. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506.001.patch, YARN-10506.002.patch, YARN-10506.003.patch, > YARN-10506.004.patch, YARN-10506.005.patch, YARN-10506.006-combined.patch, > YARN-10506.006.patch, YARN-10506.007.patch, YARN-10506.009.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263999#comment-17263999 ] Andras Gyori edited comment on YARN-10506 at 1/13/21, 9:08 AM: --- Thank you [~wangda] for the review. My comments: # My initial approach was exactly this, however, did not want to extend the scope of this patch. Will refactor the mentioned parts in a followup jira. # This was the approach at the beginning, however, we talked about this with [~shuzirra], and came to the conclusion, that: ## The auto queue creation should be used when mapping rules are turned off, therefore placement context driven AQC would be a limitation We also concluded, that its not the responsibility of the mapping rule/PlacementContext to decide whether AQC is enabled or not, the create flag will only be used to do a preliminary check of the possibility The ParentQueue isEligibleForAutoQueueCreation method was made to support this mapping rule check ## I will check it out, thank you for noticing! ## Edit: [~zhuqi]'s solution is good, I agree. # We have argued about this problem, and I agree, that the auto queue/dynamic queue terminology is overused, and wanted to make a proposal, to distinguish this new AQC from the ManagedParent old logic. However, I think it is still a good idea to use a property to drive this behaviour due to security reasons: if a customer wants to use weight mode, the AQC will implicitly be turned on, and it might be not desirable (rogue users could create queues anywhere). I prefer explicit allowance in this case. That said, we should come up with a distinguishable terminology for the new auto queue creation (perhaps saying auto-queue-creation-v2 or extended). # I too found this problematic, and will address this issue. # I will add the testcase. was (Author: gandras): Thank you [~wangda] for the review. My comments: # My initial approach was exactly this, however, did not want to extend the scope of this patch. Will refactor the mentioned parts in a followup jira. # This was the approach at the beginning, however, we talked about this with [~shuzirra], and came to the conclusion, that: ## The auto queue creation should be used when mapping rules are turned off, therefore placement context driven AQC would be a limitation We also concluded, that its not the responsibility of the mapping rule/PlacementContext to decide whether AQC is enabled or not, the create flag will only be used to do a preliminary check of the possibility The ParentQueue isEligibleForAutoQueueCreation method was made to support this mapping rule check ## I will check it out, thank you for noticing! ## Edit: [~zhuqi]'s solution is good, I approve. # We have argued about this problem, and I agree, that the auto queue/dynamic queue terminology is overused, and wanted to make a proposal, to distinguish this new AQC from the ManagedParent old logic. However, I think it is still a good idea to use a property to drive this behaviour due to security reasons: if a customer wants to use weight mode, the AQC will implicitly be turned on, and it might be not desirable (rogue users could create queues anywhere). I prefer explicit allowance in this case. That said, we should come up with a distinguishable terminology for the new auto queue creation (perhaps saying auto-queue-creation-v2 or extended). # I too found this problematic, and will address this issue. # I will add the testcase. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506.001.patch, YARN-10506.002.patch, YARN-10506.003.patch, > YARN-10506.004.patch, YARN-10506.005.patch, YARN-10506.006-combined.patch, > YARN-10506.006.patch, YARN-10506.007.patch, YARN-10506.009.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263999#comment-17263999 ] Andras Gyori edited comment on YARN-10506 at 1/13/21, 9:07 AM: --- Thank you [~wangda] for the review. My comments: # My initial approach was exactly this, however, did not want to extend the scope of this patch. Will refactor the mentioned parts in a followup jira. # This was the approach at the beginning, however, we talked about this with [~shuzirra], and came to the conclusion, that: ## The auto queue creation should be used when mapping rules are turned off, therefore placement context driven AQC would be a limitation We also concluded, that its not the responsibility of the mapping rule/PlacementContext to decide whether AQC is enabled or not, the create flag will only be used to do a preliminary check of the possibility The ParentQueue isEligibleForAutoQueueCreation method was made to support this mapping rule check ## I will check it out, thank you for noticing! ## Edit: [~zhuqi]'s solution is good, I approve. # We have argued about this problem, and I agree, that the auto queue/dynamic queue terminology is overused, and wanted to make a proposal, to distinguish this new AQC from the ManagedParent old logic. However, I think it is still a good idea to use a property to drive this behaviour due to security reasons: if a customer wants to use weight mode, the AQC will implicitly be turned on, and it might be not desirable (rogue users could create queues anywhere). I prefer explicit allowance in this case. That said, we should come up with a distinguishable terminology for the new auto queue creation (perhaps saying auto-queue-creation-v2 or extended). # I too found this problematic, and will address this issue. # I will add the testcase. was (Author: gandras): Thank you [~wangda] for the review. My comments: # My initial approach was exactly this, however, did not want to extend the scope of this patch. Will refactor the mentioned parts in a followup jira. # This was the approach at the beginning, however, we talked about this with [~shuzirra], and came to the conclusion, that: ## The auto queue creation should be used when mapping rules are turned off, therefore placement context driven AQC would be a limitation We also concluded, that its not the responsibility of the mapping rule/PlacementContext to decide whether AQC is enabled or not, the create flag will only be used to do a preliminary check of the possibility The ParentQueue isEligibleForAutoQueueCreation method was made to support this mapping rule check ## I will check it out, thank you for noticing! ## I am not sure what you mean by that, but I will check it out. However, I have distinguished the parent hierarchy in its own method to be more flexible (we might need to support deeper hierarchy, more rigorous checks etc.) # We have argued about this problem, and I agree, that the auto queue/dynamic queue terminology is overused, and wanted to make a proposal, to distinguish this new AQC from the ManagedParent old logic. However, I think it is still a good idea to use a property to drive this behaviour due to security reasons: if a customer wants to use weight mode, the AQC will implicitly be turned on, and it might be not desirable (rogue users could create queues anywhere). I prefer explicit allowance in this case. That said, we should come up with a distinguishable terminology for the new auto queue creation (perhaps saying auto-queue-creation-v2 or extended). # I too found this problematic, and will address this issue. # I will add the testcase. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506-008.patch, YARN-10506-010.patch, > YARN-10506.001.patch, YARN-10506.002.patch, YARN-10506.003.patch, > YARN-10506.004.patch, YARN-10506.005.patch, YARN-10506.006-combined.patch, > YARN-10506.006.patch, YARN-10506.007.patch, YARN-10506.009.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263961#comment-17263961 ] zhuqi edited comment on YARN-10506 at 1/13/21, 7:50 AM: [~wangda] [~gandras] Thanks [~wangda] for review. I update a patch for: According the comment by [~wangda] above: 1. CapacityScheduler 1.1 Fix the not return lq in Capacity. 1.2 Add 008 patch remaining include to fix testautocreateleaf: As above in 008 patch, Fix autoCreateLeafQueue to use path , the logic change , the 009 not include this: {code:java} @VisibleForTesting protected LeafQueue autoCreateLeafQueue( ApplicationPlacementContext placementContext) throws IOException, YarnException { String leafQueueName = placementContext.getQueue(); String parentQueueName = placementContext.getParentQueue(); if (!StringUtils.isEmpty(parentQueueName)) { CSQueue parentQueue = getQueue(parentQueueName); if (parentQueue == null) { throw new SchedulerDynamicEditException( "Could not auto-create leaf queue for " + leafQueueName + ". Queue mapping specifies an invalid parent queue " + "which does not exist " + parentQueueName); } if (conf.isAutoCreateChildQueueEnabled(parentQueue.getQueuePath())) { // Case 1: Handle ManagedParentQueue AutoCreatedLeafQueue autoCreatedLeafQueue = null; ManagedParentQueue autoCreateEnabledParentQueue = (ManagedParentQueue) parentQueue; autoCreatedLeafQueue = new AutoCreatedLeafQueue(this, leafQueueName, autoCreateEnabledParentQueue); addQueue(autoCreatedLeafQueue); return autoCreatedLeafQueue; } ... } {code} 2. CapacitySchedulerAutoQueueHandler 2.1 Back create flag in ApplicationPlacementContext, and use it in autoCreateLeafQueue placement check : {code:java} private LeafQueue autoCreateLeafQueue( ApplicationPlacementContext placementContext) throws IOException, YarnException { ... if (!StringUtils.isEmpty(parentQueueName)) { ... if (conf.isAutoCreateChildQueueEnabled(parentQueue.getQueuePath())) { ... } else { ApplicationPlacementContext apc = placementContext.clone(); // Set default true now // TODO get by queue placement policy apc.setCreateParentQueue(true); apc.setCreateLeafQueue(true); if (placementContext.isCreateLeafQueue() || placementContext.isCreateParentQueue()) { } else { ... } } } {code} 2.2 -2.3 Remove autoCreateParentHierarchy, the logic change to autoCreateQueuePath. And the autoCreateQueuePath to autoCreateQueue, and make it atomic, and fix related test. {code:java} public LeafQueue autoCreateQueue(ApplicationPlacementContext queue) throws YarnException { ... // The existingParentQueue is already ParentQueue LeafQueue leafQueue = existingParentQueue.addDynamicLeafQueue( queue.getFullQueuePath()); queueManager.addQueue(leafQueue.getQueuePath(), leafQueue); return leafQueue; } {code} 3. CapacitySchedulerConfiguration change: Remaining todo. 4. ParentQueue#addDynamicChildQueue, change the logic handle static queue with empty childs: {code:java} private QueueCapacityType getCapacityConfigurationTypeForQueues( Collection queues) throws IOException { ... else { if (isDynamicQueue()) { // If it is an empty dynamic queue, consider weight mode instead return QueueCapacityType.WEIGHT; } else { // Double check queues should be empty // When static queue with queues empty, return parent ConfigurationType if (queues.isEmpty()) { return getCapacityConfigurationTypeForQueues(ImmutableList.of(this)); } else { throw new IOException("Unknown case when child queues not empty! " + "But no child config type!"); } } } {code} 5. Change the submit test to trigger the placement: {code:java} @Test public void testAutoQueueCreationOnAppSubmission() throws Exception { startScheduler(); createBasicQueueStructureAndValidate(); submitApp(cs, USER0, USER0, "root.e-auto"); AbstractCSQueue e = (AbstractCSQueue) cs.getQueue("root.e-auto"); Assert.assertNotNull(e); Assert.assertTrue(e.isDynamicQueue()); AbstractCSQueue user0 = (AbstractCSQueue) cs.getQueue("root.e-auto." + USER0); Assert.assertNotNull(user0); Assert.assertTrue(user0.isDynamicQueue()); } {code} 6. Fix clone finding bugs. Remaining to do minor things: # AutoCreateLeafQueue should be moved to autoQueueHandler. # CSQueueUtils#extractQueuePath should be moved to CSAutoQueueHandler. (It won't be used by other classes). And rename extractQueuePath to extractApplicationPlacementContext (we don't just extract path). # Add test case that static queues with empty childs. Major t
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263118#comment-17263118 ] zhuqi edited comment on YARN-10506 at 1/12/21, 7:08 AM: [~wangda] [~gandras] Update a patch to fix two problem: # Fix autoCreateLeafQueue to use path , the logic change to {code:java} @VisibleForTesting protected LeafQueue autoCreateLeafQueue( ApplicationPlacementContext placementContext) throws IOException, YarnException { String leafQueueName = placementContext.getQueue(); String parentQueueName = placementContext.getParentQueue(); if (!StringUtils.isEmpty(parentQueueName)) { CSQueue parentQueue = getQueue(parentQueueName); if (parentQueue == null) { throw new SchedulerDynamicEditException( "Could not auto-create leaf queue for " + leafQueueName + ". Queue mapping specifies an invalid parent queue " + "which does not exist " + parentQueueName); } if (conf.isAutoCreateChildQueueEnabled(parentQueue.getQueuePath())) { // Case 1: Handle ManagedParentQueue AutoCreatedLeafQueue autoCreatedLeafQueue = null; ManagedParentQueue autoCreateEnabledParentQueue = (ManagedParentQueue) parentQueue; autoCreatedLeafQueue = new AutoCreatedLeafQueue(this, leafQueueName, autoCreateEnabledParentQueue); addQueue(autoCreatedLeafQueue); return autoCreatedLeafQueue; } else { try { writeLock.lock(); LeafQueue lq = autoQueueHandler.autoCreateQueuePath(placementContext); } finally { writeLock.unlock(); } } } throw new SchedulerDynamicEditException( "Could not auto-create leaf queue for " + leafQueueName + ". Queue mapping does not specify" + " which parent queue it needs to be created under."); } {code} # Fix testActivateApplicationAfterQueueRefresh in testLeafQueue to update active apps use update, because of reinitialize remove active application logic: {code:java} // This will not update active apps root.reinitialize(newRoot, csContext.getClusterResource()); // Cause this to update active apps root.updateClusterResource(csContext.getClusterResource(), new ResourceLimits(csContext.getClusterResource())); {code} Thanks. was (Author: zhuqi): [~wangda] [~gandras] Update a patch to fix two problem: # Fix autoCreateLeafQueue to use path , the logic change to {code:java} @VisibleForTesting protected LeafQueue autoCreateLeafQueue( ApplicationPlacementContext placementContext) throws IOException, YarnException { String leafQueueName = placementContext.getQueue(); String parentQueueName = placementContext.getParentQueue(); if (!StringUtils.isEmpty(parentQueueName)) { CSQueue parentQueue = getQueue(parentQueueName); if (parentQueue == null) { throw new SchedulerDynamicEditException( "Could not auto-create leaf queue for " + leafQueueName + ". Queue mapping specifies an invalid parent queue " + "which does not exist " + parentQueueName); } if (conf.isAutoCreateChildQueueEnabled(parentQueue.getQueuePath())) { // Case 1: Handle ManagedParentQueue AutoCreatedLeafQueue autoCreatedLeafQueue = null; ManagedParentQueue autoCreateEnabledParentQueue = (ManagedParentQueue) parentQueue; autoCreatedLeafQueue = new AutoCreatedLeafQueue(this, leafQueueName, autoCreateEnabledParentQueue); addQueue(autoCreatedLeafQueue); return autoCreatedLeafQueue; } else { try { writeLock.lock(); LeafQueue lq = autoQueueHandler.autoCreateQueuePath(placementContext); } finally { writeLock.unlock(); } } } throw new SchedulerDynamicEditException( "Could not auto-create leaf queue for " + leafQueueName + ". Queue mapping does not specify" + " which parent queue it needs to be created under."); } {code} # Fix testActivateApplicationAfterQueueRefresh in testLeafQueue to update active apps use update: {code:java} // This will not update active apps root.reinitialize(newRoot, csContext.getClusterResource()); // Cause this to update active apps root.updateClusterResource(csContext.getClusterResource(), new ResourceLimits(csContext.getClusterResource())); {code} Thanks. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262660#comment-17262660 ] zhuqi edited comment on YARN-10506 at 1/11/21, 2:11 PM: [~wangda] [~gandras] Add an updated YARN-10506-007-10504-010.patch for fix all test in TestCapacitySchedulerNewQueueAutoCreation. The logic is : In ParentQueue's addDynamicChildQueue {code:java} // New method to add child queue public CSQueue addDynamicChildQueue(String childQueuePath, boolean isLeaf) throws SchedulerDynamicEditException { writeLock.lock(); try { // Changed here boolean weightsAreUsed = false; try { if (isLeaf && childQueues.isEmpty()) { weightsAreUsed = true; } else { weightsAreUsed = getCapacityConfigurationTypeForQueues(childQueues) == QueueCapacityType.WEIGHT; } } catch (IOException e) { LOG.warn("Caught Exception during auto queue creation", e); } ... }{code} Reason: When add a dynamic leaf queue, but we still not update the weight it, when the childQueues is empty, the original getCapacityConfigurationTypeForQueues will return percentage, we should handle this case to weightsAreUsed = true. When the childQueues is not empty, we just can use getCapacityConfigurationTypeForQueues for check. And then, the new created leaf will update weight. Any other test fail, may be related queueplacement. Thanks. was (Author: zhuqi): [~wangda] [~gandras] Add an updated YARN-10506-007-10504-010.patch for fix all test in TestCapacitySchedulerNewQueueAutoCreation. The logic is : In ParentQueue's addDynamicChildQueue {code:java} // New method to add child queue public CSQueue addDynamicChildQueue(String childQueuePath, boolean isLeaf) throws SchedulerDynamicEditException { writeLock.lock(); try { // Changed here boolean weightsAreUsed = false; try { if (isLeaf && childQueues.isEmpty()) { weightsAreUsed = true; } else { weightsAreUsed = getCapacityConfigurationTypeForQueues(childQueues) == QueueCapacityType.WEIGHT; } } catch (IOException e) { LOG.warn("Caught Exception during auto queue creation", e); } ... }{code} Reason: When add a dynamic leaf queue, but we still not update the weight it, when the childQueues is empty, the original getCapacityConfigurationTypeForQueues will return percentage, we should handle this case to weightsAreUsed = true. When the childQueues is not empty, we just can use getCapacityConfigurationTypeForQueues for check. Any other test fail, may be related queueplacement. Thanks. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506.001.patch, YARN-10506.002.patch, > YARN-10506.003.patch, YARN-10506.004.patch, YARN-10506.005.patch, > YARN-10506.006-combined.patch, YARN-10506.006.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262660#comment-17262660 ] zhuqi edited comment on YARN-10506 at 1/11/21, 2:10 PM: [~wangda] [~gandras] Add an updated YARN-10506-007-10504-010.patch for fix all test in TestCapacitySchedulerNewQueueAutoCreation. The logic is : In ParentQueue's addDynamicChildQueue {code:java} // New method to add child queue public CSQueue addDynamicChildQueue(String childQueuePath, boolean isLeaf) throws SchedulerDynamicEditException { writeLock.lock(); try { // Changed here boolean weightsAreUsed = false; try { if (isLeaf && childQueues.isEmpty()) { weightsAreUsed = true; } else { weightsAreUsed = getCapacityConfigurationTypeForQueues(childQueues) == QueueCapacityType.WEIGHT; } } catch (IOException e) { LOG.warn("Caught Exception during auto queue creation", e); } ... }{code} Reason: When add a dynamic leaf queue, but we still not update the weight it, when the childQueues is empty, the original getCapacityConfigurationTypeForQueues will return percentage, we should handle this case to weightsAreUsed = true. When the childQueues is not empty, we just can use getCapacityConfigurationTypeForQueues for check. Any other test fail, may be related queueplacement. Thanks. was (Author: zhuqi): [~wangda] [~gandras] Add an updated patch for fix all test in TestCapacitySchedulerNewQueueAutoCreation. In ParentQueue's addDynamicChildQueue {code:java} // New method to add child queue public CSQueue addDynamicChildQueue(String childQueuePath, boolean isLeaf) throws SchedulerDynamicEditException { writeLock.lock(); try { // Changed here boolean weightsAreUsed = false; try { if (isLeaf && childQueues.isEmpty()) { weightsAreUsed = true; } else { weightsAreUsed = getCapacityConfigurationTypeForQueues(childQueues) == QueueCapacityType.WEIGHT; } } catch (IOException e) { LOG.warn("Caught Exception during auto queue creation", e); } ... }{code} Reason: When add a dynamic leaf queue, but we still not update the weight it, also the childQueues is empty, when original getCapacityConfigurationTypeForQueues will return percentage, we should handle this case to weightsAreUsed = true, avoid the error. Any other test fail, may be related queueplacement. Thanks. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506-006-10504-010.patch, > YARN-10506-007-10504-010.patch, YARN-10506.001.patch, YARN-10506.002.patch, > YARN-10506.003.patch, YARN-10506.004.patch, YARN-10506.005.patch, > YARN-10506.006-combined.patch, YARN-10506.006.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253943#comment-17253943 ] zhuqi edited comment on YARN-10506 at 12/23/20, 7:53 AM: - [~gandras] Thanks for your reply. [~wangda] [~gandras] 1. What i said is related policy management, we should fixed it later, and may be we need a new sub-task. 2. I think we should disallow it by throwing an exception, but we can add a admin api to realize destroying all its underlying children for optional, it will be more reasonable. 3. Also the 2. question is related to auto deleting, we should also discuss it. Thanks. was (Author: zhuqi): [~gandras] Thanks for your reply. 1. What i said is related policy management, we should fixed it later, and may be we need a new sub-task. 2. I think we should disallow it by throwing an exception, but we can add a admin api to realize destroying all its underlying children for optional, it will be more reasonable. 3. Also the 2. question is related to auto deleting, we should also discuss it. Thanks. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506.001.patch, YARN-10506.002.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253943#comment-17253943 ] zhuqi edited comment on YARN-10506 at 12/23/20, 7:52 AM: - [~gandras] Thanks for your reply. 1. What i said is related policy management, we should fixed it later, and may be we need a new sub-task. 2. I think we should disallow it by throwing an exception, but we can add a admin api to realize destroying all its underlying children for optional, it will be more reasonable. 3. Also the 2. question is related to auto deleting, we should also discuss it. Thanks. was (Author: zhuqi): [~gandras] Thanks for your reply. 1. What i said is related policy management, we should fixed it later, and may be we need a new sub-task. 2. I think we should disallow it by throwing an exception, but we can add a admin api to realize destroying all its underlying children for optional, it will be more reasonable. Thanks. > Update queue creation logic to use weight mode and allow the flexible > static/dynamic creation > - > > Key: YARN-10506 > URL: https://issues.apache.org/jira/browse/YARN-10506 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Benjamin Teke >Assignee: Andras Gyori >Priority: Major > Attachments: YARN-10506.001.patch, YARN-10506.002.patch > > > The queue creation logic should be updated to use weight mode and support the > flexible creation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-10506) Update queue creation logic to use weight mode and allow the flexible static/dynamic creation
[ https://issues.apache.org/jira/browse/YARN-10506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17252919#comment-17252919 ] zhuqi edited comment on YARN-10506 at 12/21/20, 3:30 PM: - [~wangda] [~gandras] When i deep into some test in TestCapacitySchedulerAutoQueueCreation, because of the update resources now independent of queue init according to YARN-10504 and it will not update policy template related absolute resources and etc something related, we should finish the policy related absolute update in ManagedParentQueue's addChildQueue such as : {code:java} // In ManagedParentQueue's addChildQueue for (String label : policy.leafQueueTemplateNodeLabels) { // Weight will be normalized to queue.weight = // queue.weight(sum({sibling-queues.weight})) // When weight is set, capacity will be set to 0; // When capacity is set, weight will be normalized to 0, // So get larger from normalized_weight and capacity will make sure we do // calculation correct float capacity = Math.max(policy.leafQueueTemplate.getQueueCapacities().getCapacity(label), policy.leafQueueTemplate.getQueueCapacities().getNormalizedWeight(label)); if (capacity > 0f) { policy.leafQueueTemplate.getQueueCapacities() .setAbsoluteCapacity(label, capacity * ( parentQueueCapacities == null ? 1 : parentQueueCapacities.getAbsoluteCapacity(label))); policy.leafQueueTemplateCapacities = policy.leafQueueTemplate.getQueueCapacities(); } float maxCapacity = policy.leafQueueTemplate.getQueueCapacities().getMaximumCapacity(label); if (maxCapacity > 0f) { policy.leafQueueTemplate.getQueueCapacities() .setAbsoluteMaximumCapacity(label, maxCapacity * ( parentQueueCapacities == null ? 1 : parentQueueCapacities.getAbsoluteMaximumCapacity(label))); policy.leafQueueTemplateCapacities = policy.leafQueueTemplate.getQueueCapacities(); } } {code} and policy should update template : {code:java} private void updateCapacityFromTemplate(QueueCapacities capacities, String nodeLabel) { capacities.setCapacity(nodeLabel, leafQueueTemplateCapacities.getCapacity(nodeLabel)); capacities.setMaximumCapacity(nodeLabel, leafQueueTemplateCapacities.getMaximumCapacity(nodeLabel)); capacities.setAbsoluteCapacity(nodeLabel, leafQueueTemplateCapacities.getAbsoluteCapacity(nodeLabel)); } {code} Then the test will pass, but the reinitialize related will still block. Also when we reinitialize the queue, we should think how to bring up this to reinitialize and update resource progress. So i am thinking , if it is worth to make update resources independent of queue init. was (Author: zhuqi): [~wangda] [~gandras] When i deep into some test in TestCapacitySchedulerAutoQueueCreation, because of the update resources now independent of queue init according to [YARN-10504|https://issues.apache.org/jira/browse/YARN-10504], we should finish the policy related absolute update in ManagedParentQueue's addChildQueue such as : {code:java} // In ManagedParentQueue's addChildQueue for (String label : policy.leafQueueTemplateNodeLabels) { // Weight will be normalized to queue.weight = // queue.weight(sum({sibling-queues.weight})) // When weight is set, capacity will be set to 0; // When capacity is set, weight will be normalized to 0, // So get larger from normalized_weight and capacity will make sure we do // calculation correct float capacity = Math.max(policy.leafQueueTemplate.getQueueCapacities().getCapacity(label), policy.leafQueueTemplate.getQueueCapacities().getNormalizedWeight(label)); if (capacity > 0f) { policy.leafQueueTemplate.getQueueCapacities() .setAbsoluteCapacity(label, capacity * ( parentQueueCapacities == null ? 1 : parentQueueCapacities.getAbsoluteCapacity(label))); policy.leafQueueTemplateCapacities = policy.leafQueueTemplate.getQueueCapacities(); } float maxCapacity = policy.leafQueueTemplate.getQueueCapacities().getMaximumCapacity(label); if (maxCapacity > 0f) { policy.leafQueueTemplate.getQueueCapacities() .setAbsoluteMaximumCapacity(label, maxCapacity * ( parentQueueCapacities == null ? 1 : parentQueueCapacities.getAbsoluteMaximumCapacity(label))); policy.leafQueueTemplateCapacities = policy.leafQueueTemplate.getQueueCapacities(); } } {code} and policy should update template : {code:java} private void updateCapacityFromTemplate(QueueCapacities capacities, String nodeLabel) { capacities.setCapacity(nodeLabel, leafQueueTemplateCapacities.getCapacity(nodeLabel)); capacities.setMaximumCapacity(nodeLabel, leafQueueTemplateCapacities.getMaximumCapacity(nodeLabel)); capacities.setAbsoluteCapacity(nodeLabel, leafQueueTemplateCapacities.getAbso