[ https://issues.apache.org/jira/browse/YARN-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17261872#comment-17261872 ]
zhuqi commented on YARN-10504: ------------------------------ [~wangda] [~bteke] 1. The {{updateAbsoluteCapacitiesAndRelatedFields should update maxApplications, but in some case, for example:}} {{ in TestCapacitySchedulerAutoQueueCreation -> }}testAutoCreatedQueueActivationDeactivation {code:java} //submit user_3 app. This cant be allocated since there is no capacity // in NO_LABEL, SSD but can be in GPU label submitApp(mockRM, parentQueue, USER3, USER3, 4, 1); final CSQueue user3LeafQueue = cs.getQueue(USER3); validateCapacities((AutoCreatedLeafQueue) user3LeafQueue, 0.0f, 0.0f, 1.0f, 1.0f); validateCapacitiesByLabel((ManagedParentQueue) parentQueue, (AutoCreatedLeafQueue) user3LeafQueue, NODEL_LABEL_GPU); {code} The case is no capacity in user_3 autoCreatedLeafQueue, so in {{updateAbsoluteCapacitiesAndRelatedFields}} {code:java} private void updateAbsoluteCapacitiesAndRelatedFields() { updateAbsoluteCapacities(); CapacitySchedulerConfiguration schedulerConf = csContext.getConfiguration(); // If maxApplications not set, use the system total max app, apply newly // calculated abs capacity of the queue. if (maxApplications <= 0) { int maxSystemApps = schedulerConf. getMaximumSystemApplications(); maxApplications = (int) (maxSystemApps * queueCapacities.getAbsoluteCapacity()); } maxApplicationsPerUser = Math.min(maxApplications, (int) (maxApplications * (usersManager.getUserLimit() / 100.0f) * usersManager.getUserLimitFactor())); } // because capacities will update to 0 if (availableCapacity >= leafQueueTemplateCapacities .getAbsoluteCapacity(nodeLabel)) { updateCapacityFromTemplate(capacities, nodeLabel); activate(leafQueue, nodeLabel); } else{ updateToZeroCapacity(capacities, nodeLabel); } // And because, the update will be after reinitializeFromTemplate final AutoCreatedLeafQueueConfig initialLeafQueueTemplate = queueManagementPolicy.getInitialLeafQueueConfiguration(leafQueue); leafQueue.reinitializeFromTemplate(initialLeafQueueTemplate); // Do one update cluster resource call to make sure all absolute resources // effective resources are updated. updateClusterResource(this.csContext.getClusterResource(), new ResourceLimits(this.csContext.getClusterResource()));{code} The maxApplications and maxApplicationsPerUser will be 0. So will should handle in new logic in //TODO recalculate max applications because they can depend on capacity The todo should be removed, just pass the AutoCreatedLeafQueue case now, or add logic to make this case's maxApplications to a fixed default num. 2. As mentioned by [~bteke] "Sharing my latest findings on TestAbsoluteResourceWithAutoQueue failure: {{AutoCreatedLeafQueue#reinitializeFromTemplate }}was refactored, now the getting and merging the QueueCapacities happens *before* calling the {{ParentQueue#updateClusterResource}} (and {{LeafQueue#updateClusterResource}}). In \{{LeafQueue#updateClusterResource }}the \{{AbstractCSQueue#updateEffectiveResources }}is called where the effectiveMinResource of the created queue is overridden with the template's effectiveMinResources which is exactly the same the test is getting in the asserts." We should changed the {{LeafQueue updateClusterResource }}to: {code:java} // public void updateClusterResource(Resource clusterResource, ResourceLimits currentResourceLimits) { writeLock.lock(); try { ... if (!(this instanceof AutoCreatedLeafQueue)) { super.updateEffectiveResources(clusterResource); } }{code} It will fix absolute case TestAbsoluteResourceWithAutoQueue . If you any other advice? Thanks. > Implement weight mode in Capacity Scheduler > ------------------------------------------- > > Key: YARN-10504 > URL: https://issues.apache.org/jira/browse/YARN-10504 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Benjamin Teke > Assignee: Benjamin Teke > Priority: Major > Attachments: YARN-10504.001.patch, YARN-10504.002.patch, > YARN-10504.003.patch, YARN-10504.004.patch, YARN-10504.005.patch, > YARN-10504.ver-1.patch, YARN-10504.ver-2.patch, YARN-10504.ver-3.patch > > > To allow the possibility to flexibly create queues in Capacity Scheduler a > weight mode should be introduced. The existing \{{capacity }}property should > be used with a different syntax, i.e: > root.users.capacity = (1.0) or ~1.0 or ^1.0 or @1.0 > root.users.capacity = 1.0w > root.users.capacity = w:1.0 > Weight support should not impact the existing functionality. > > The new functionality should: > * accept and validate the new weight values > * enforce a singular mode on the whole queue tree > * (re)calculate the relative (percentage-based) capacities based on the > weights during launch and every time the queue structure changes -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org