[ https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053309#comment-17053309 ]
Prabhu Joseph commented on YARN-9879: ------------------------------------- Thanks [~shuzirra] for the patch. Have tested below scenarios with the patch and it works fine except two issues. 1. Job Submission with leaf queuename and full queue path. 2. Queue Placement 3. Auto Creation of Leaf Queue. 4. RM UI 5. RMWebService Scheduler response. 6. RMAdminService RefreshQueues 7. Scheduler Configuration Mutation API - add / remove / update queue. 8. Recovery 9. RM JMX Metrics - YARN-9772 *Issue 1: RM fails to start when a dynamic parent queue "batch" (auto-create-child-queue.enabled=true) and another leaf queue "batch" exists.* CS Config: root.batch -> (auto-create-child-queue.enabled=true) root.default root.A.batch yarn.scheduler.capacity.queue-mappings = u:%user:batch.%user* {code:java} 2020-03-06 00:54:59,239 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:876) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1288) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1576) Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:757) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:342) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:418) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 7 more Caused by: java.io.IOException: mapping contains invalid or non-leaf queue [%user] and invalid parent queue [batch] at org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:50) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363) at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:298) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:674) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:709) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:750) {code} *Complete CS Config to repro above issue:* {code:java} <configuration xmlns:xi="http://www.w3.org/2001/XInclude"> <property><name>yarn.scheduler.capacity.root.batch.leaf-queue-template.capacity</name> <value>40</value></property> <property><name>yarn.scheduler.capacity.queue-mappings</name> <value>u:%user:batch.%user</value></property> <property><name>yarn.scheduler.capacity.root.batch.auto-create-child-queue.enabled</name> <value>true</value></property> <property> <name>yarn.scheduler.capacity.root.queues</name> <value>default,batch,A</value> </property> <property> <name>yarn.scheduler.capacity.queue-mappings-override.enable</name> <value>false</value> </property> <property> <name>yarn.scheduler.capacity.root.capacity</name> <value>100</value> </property> <property> <name>yarn.scheduler.capacity.root.default.capacity</name> <value>40</value> </property> <property> <name>yarn.scheduler.capacity.root.batch.capacity</name> <value>40</value> </property> <property> <name>yarn.scheduler.capacity.root.A.capacity</name> <value>20</value> </property> <property> <name>yarn.scheduler.capacity.root.A.queues</name> <value>batch</value> </property> <property> <name>yarn.scheduler.capacity.root.A.batch.capacity</name> <value>100</value> </property> </configuration> {code} *Issue 2:* *RM Starts fine with below queue config but when submitting job with queuename "A" it fails. The job submission works fine when specifying the full queue name root.B.A. There is only one leaf queue with queuename "A" and the placement has to find that right?* root.A.B root.B.A {code:java} yarn jar /HADOOP/hadoop-3.3.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.0-SNAPSHOT-tests.jar sleep -Dmapreduce.job.queuename=A -m 1 -mt 1 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1583486216805_0002 to YARN : Application application_1583486216805_0002 submitted by user hive to unknown queue: A at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:336) at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:304) at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:331) ... 25 more {code} > Allow multiple leaf queues with the same name in CS > --------------------------------------------------- > > Key: YARN-9879 > URL: https://issues.apache.org/jira/browse/YARN-9879 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Gergely Pollak > Assignee: Gergely Pollak > Priority: Major > Labels: fs2cs > Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, > YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, > YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, > YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, > YARN-9879.POC010.patch, YARN-9879.POC011.patch > > > Currently the leaf queue's name must be unique regardless of its position in > the queue hierarchy. > Design doc and first proposal is being made, I'll attach it as soon as it's > done. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org