[ https://issues.apache.org/jira/browse/YARN-11641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805189#comment-17805189 ]
ASF GitHub Bot commented on YARN-11641: --------------------------------------- tomicooler opened a new pull request, #6435: URL: https://github.com/apache/hadoop/pull/6435 <!-- Thanks for sending a pull request! 1. If this is your first time, please read our contributor guidelines: https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute 2. Make sure your PR title starts with JIRA issue id, e.g., 'HADOOP-17799. Your PR title ...'. --> ### Description of PR WIP: until the other 2 ticket is merged, I'll rebase this PR. Details in the Jira: [YARN-11641](https://issues.apache.org/jira/browse/YARN-11641) Note: it is not possible to rely on the capacityVectors (at least not for the root queue, which is always in percentage mode with 100%). So I decided to go with the `checkConfigTypeIsAbsoluteResource` approach. ### How was this patch tested? Tested manually and added a unit test. ### For code changes: - [x] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'YARN-11641 Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Can't update a queue hierarchy in absolute mode when the configured > capacities are zero > --------------------------------------------------------------------------------------- > > Key: YARN-11641 > URL: https://issues.apache.org/jira/browse/YARN-11641 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.4.0 > Reporter: Tamas Domok > Assignee: Tamas Domok > Priority: Major > Attachments: hierarchy.png > > > h2. Error symptoms > It is not possible to modify a queue hierarchy in absolute mode when the > parent or every child queue of the parent has 0 min resource configured. > {noformat} > 2024-01-05 15:38:59,016 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager: > Initialized queue: root.a.c > 2024-01-05 15:38:59,016 ERROR > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception > thrown when modifying configuration. > java.io.IOException: Failed to re-init queues : Parent=root.a: When absolute > minResource is used, we must make sure both parent and child all use absolute > minResource > {noformat} > h2. Reproduction > capacity-scheduler.xml > {code:xml} > <?xml version="1.0"?> > <configuration> > <property> > <name>yarn.scheduler.capacity.root.queues</name> > <value>default,a</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.capacity</name> > <value>[memory=40960, vcores=16]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.default.capacity</name> > <value>[memory=1024, vcores=1]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.default.maximum-capacity</name> > <value>[memory=1024, vcores=1]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.capacity</name> > <value>[memory=0, vcores=0]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.maximum-capacity</name> > <value>[memory=39936, vcores=15]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.queues</name> > <value>b,c</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.b.capacity</name> > <value>[memory=0, vcores=0]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.b.maximum-capacity</name> > <value>[memory=39936, vcores=15]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.c.capacity</name> > <value>[memory=0, vcores=0]</value> > </property> > <property> > <name>yarn.scheduler.capacity.root.a.c.maximum-capacity</name> > <value>[memory=39936, vcores=15]</value> > </property> > </configuration> > {code} > !hierarchy.png! > updatequeue.xml > {code:xml} > <?xml version="1.0" encoding="UTF-8" standalone="yes"?> > <sched-conf> > <update-queue> > <queue-name>root.a</queue-name> > <params> > <entry> > <key>capacity</key> > <value>[memory=1024,vcores=1]</value> > </entry> > <entry> > <key>maximum-capacity</key> > <value>[memory=39936,vcores=15]</value> > </entry> > </params> > </update-queue> > </sched-conf> > {code} > {code} > $ curl -X PUT -H 'Content-Type: application/xml' -d @updatequeue.xml > http://localhost:8088/ws/v1/cluster/scheduler-conf\?user.name\=yarn > Failed to re-init queues : Parent=root.a: When absolute minResource is used, > we must make sure both parent and child all use absolute minResource > {code} > h2. Root cause > setChildQueues is called during reinit, where: > {code:java} > void setChildQueues(Collection<CSQueue> childQueues) throws IOException { > writeLock.lock(); > try { > boolean isLegacyQueueMode = > queueContext.getConfiguration().isLegacyQueueMode(); > if (isLegacyQueueMode) { > QueueCapacityType childrenCapacityType = > getCapacityConfigurationTypeForQueues(childQueues); > QueueCapacityType parentCapacityType = > getCapacityConfigurationTypeForQueues(ImmutableList.of(this)); > if (childrenCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE > || parentCapacityType == QueueCapacityType.ABSOLUTE_RESOURCE) { > // We don't allow any mixed absolute + {weight, percentage} between > // children and parent > if (childrenCapacityType != parentCapacityType && > !this.getQueuePath() > .equals(CapacitySchedulerConfiguration.ROOT)) { > throw new IOException("Parent=" + this.getQueuePath() > + ": When absolute minResource is used, we must make sure > both " > + "parent and child all use absolute minResource"); > } > {code} > The parent or childrenCapacityType will be considered as PERCENTAGE, because > getCapacityConfigurationTypeForQueues fails to detect the absolute mode, here: > {code:java} > if > (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel) > .equals(Resources.none())) { > absoluteMinResSet = true; > {code} > (It only happens in legacy queue mode.) > h2. Possible fixes > Possible fix in AbstractParentQueue.getCapacityConfigurationTypeForQueues > using the capacityVector: > {code:java} > for (CSQueue queue : queues) { > for (String nodeLabel : queueCapacities.getExistingNodeLabels()) { > Set<QueueCapacityVector.ResourceUnitCapacityType> > definedCapacityTypes = > > queue.getConfiguredCapacityVector(nodeLabel).getDefinedCapacityTypes(); > if (definedCapacityTypes.size() == 1) { > QueueCapacityVector.ResourceUnitCapacityType next = > definedCapacityTypes.iterator().next(); > if (Objects.requireNonNull(next) == PERCENTAGE) { > percentageIsSet = true; > diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", > label=").append(nodeLabel) > .append(" uses percentage mode}. "); > } else if (next == > QueueCapacityVector.ResourceUnitCapacityType.ABSOLUTE) { > absoluteMinResSet = true; > diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", > label=").append(nodeLabel) > .append(" uses absolute mode}. "); > } else if (next == > QueueCapacityVector.ResourceUnitCapacityType.WEIGHT) { > weightIsSet = true; > diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", > label=").append(nodeLabel) > .append(" uses weight mode}. "); > } > } else if (definedCapacityTypes.size() > 1) { > mixedIsSet = true; > diagMsg.append("{Queue=").append(queue.getQueuePath()).append(", > label=").append(nodeLabel) > .append(" uses mixed mode}. "); > } > } > } > {code} > Pre capacityVector, we could utilise checkConfigTypeIsAbsoluteResource, e.g.: > {code:java} > - if > (!queue.getQueueResourceQuotas().getConfiguredMinResource(nodeLabel) > - .equals(Resources.none())) { > + if (checkConfigTypeIsAbsoluteResource(queue.getQueuePath(), > nodeLabel)) { > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org