[ 
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053309#comment-17053309
 ] 

Prabhu Joseph commented on YARN-9879:
-------------------------------------

Thanks [~shuzirra] for the patch. Have tested below scenarios with the patch 
and it works fine except two issues.
 
 1. Job Submission with leaf queuename and full queue path.
 2. Queue Placement
 3. Auto Creation of Leaf Queue.
 4. RM UI
 5. RMWebService Scheduler response.
 6. RMAdminService RefreshQueues
 7. Scheduler Configuration Mutation API - add / remove / update queue.
 8. Recovery
 9. RM JMX Metrics - YARN-9772

*Issue 1: RM fails to start when a dynamic parent queue "batch" 
(auto-create-child-queue.enabled=true) and another leaf queue "batch" exists.* 

CS Config:

root.batch -> (auto-create-child-queue.enabled=true)
 root.default
 root.A.batch

yarn.scheduler.capacity.queue-mappings = u:%user:batch.%user*

 
{code:java}
2020-03-06 00:54:59,239 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
ResourceManager
 org.apache.hadoop.service.ServiceStateException: 
org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues
 at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
 at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:876)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1288)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1576)
 Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to 
initialize queues
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:757)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:342)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:418)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
 ... 7 more
 Caused by: java.io.IOException: mapping contains invalid or non-leaf queue 
[%user] and invalid parent queue [batch]
 at 
org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:50)
 at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
 at 
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:298)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:674)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:709)
 at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:750)
{code}
 

*Complete CS Config to repro above issue:*
{code:java}
<configuration xmlns:xi="http://www.w3.org/2001/XInclude";>

<property><name>yarn.scheduler.capacity.root.batch.leaf-queue-template.capacity</name>
 <value>40</value></property>

<property><name>yarn.scheduler.capacity.queue-mappings</name>
 <value>u:%user:batch.%user</value></property>

<property><name>yarn.scheduler.capacity.root.batch.auto-create-child-queue.enabled</name>
 <value>true</value></property>

<property>
 <name>yarn.scheduler.capacity.root.queues</name>
 <value>default,batch,A</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
 <value>false</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.root.capacity</name>
 <value>100</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.root.default.capacity</name>
 <value>40</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.root.batch.capacity</name>
 <value>40</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.root.A.capacity</name>
 <value>20</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.root.A.queues</name>
 <value>batch</value>
 </property>

<property>
 <name>yarn.scheduler.capacity.root.A.batch.capacity</name>
 <value>100</value>
 </property>

</configuration>
{code}
 

*Issue 2:*

*RM Starts fine with below queue config but when submitting job with queuename 
"A" it fails. The job submission works fine when specifying the full queue name 
root.B.A. There is only one leaf queue with queuename "A" and the placement has 
to find that right?*

root.A.B 
 root.B.A

 
{code:java}
yarn jar 
/HADOOP/hadoop-3.3.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.0-SNAPSHOT-tests.jar
 sleep -Dmapreduce.job.queuename=A -m 1 -mt 1

Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit 
application_1583486216805_0002 to YARN : Application 
application_1583486216805_0002 submitted by user hive to unknown queue: A
 at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:336)
 at 
org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:304)
 at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:331)
 ... 25 more
{code}

> Allow multiple leaf queues with the same name in CS
> ---------------------------------------------------
>
>                 Key: YARN-9879
>                 URL: https://issues.apache.org/jira/browse/YARN-9879
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Gergely Pollak
>            Assignee: Gergely Pollak
>            Priority: Major
>              Labels: fs2cs
>         Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf, 
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch, 
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch, 
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch, 
> YARN-9879.POC010.patch, YARN-9879.POC011.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in 
> the queue hierarchy. 
> Design doc and first proposal is being made, I'll attach it as soon as it's 
> done.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to