[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049581#comment-15049581 ]
Wangda Tan commented on YARN-4415: ---------------------------------- [~Naganarasimha]/[~xinxianyin]. Let me try to summary what we were discussing. There're 2 different configurations: 1) Accessible-node-labels for queue 2) Maximum-capacity for partitions There're 4 different combinations for default values: a. 1)=*, 2)=100 Pros: - User doesn't need to update configurations a lot if new labels added (Assume partition will be shared to all queues) Cons: - User has to change configurations a lot if new labels added (Assume partition will be shared to few queues only) b. 1)=*, 2)=0 Pros: - User doesn't need to update configurations a lot if new labels added (Assume partition will be shared to few queues only) Cons: - User has to change configurations a lot if new labels added (Assume partition will be shared to all queues) c. 1)=<empty>, 2=100 Same as b. d. 1)=<empty>, 2=0 Same as b. You can see that there're different pros and cons to choose default values of the two options. Frankly I don't have strong preference for all these choices. But since we have decided default values since 2.6, I would suggest don't change the default values. But I think there's one thing we need to fix: When queue.accessible-node-labels == *, {{QueueCapacitiesInfo#QueueCapacitiesInfo(QueueCapacities)}} should call RMNodeLabelsManager.getClusterNodeLabelNames to get all labels instead of calling {{getExistingNodeLabels}}. So after we add/remove labels, queue's capacities in webUI/REST response will be updated as well. > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > ------------------------------------------------------------------------------------------------------------ > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager > Affects Versions: 2.7.2 > Reporter: Naganarasimha G R > Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png, > capacity-scheduler.xml, screenshot-1.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)