[ https://issues.apache.org/jira/browse/YARN-11608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan updated YARN-11608: ------------------------------ Target Version/s: 3.4.0 > QueueCapacityVectorInfo NPE when accesible labels config is used > ---------------------------------------------------------------- > > Key: YARN-11608 > URL: https://issues.apache.org/jira/browse/YARN-11608 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 3.4.0 > Reporter: Benjamin Teke > Assignee: Benjamin Teke > Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > YARN-11514 extended the REST API to contain CapacityVectors for each > configured node label. There is an edgecase however: during the > initialization the each queue's capacities map will be filled with 0 > capacities for the unconfigured, but accessible labels (where there is no > configured capacity for the label, however the queue has access to it based > on the accessible-node-labels property). A very basic example configuration > for this is the following: > {code:java} > "yarn.scheduler.capacity.root.queues": "a, b" > "yarn.scheduler.capacity.root.a.capacity": "50"); > "yarn.scheduler.capacity.root.a.accessible-node-labels": > "root-a-default-label" > "yarn.scheduler.capacity.root.a.maximum-capacity": "50" > "yarn.scheduler.capacity.root.b.capacity": "50" > {code} > root.a has access to root-a-default-label, however there is no configured > capacity for it. The capacityVectors are parsed based on the > configuredCapacity map (created from the > "accessible-node-labels.<label>.capacity" configs). When the scheduler info > is requested the capacityVectors are collected per label, and the labels used > for this are the keySet of the capacity map: > {code:java} > for (String partitionName : capacities.getExistingNodeLabels()) { > QueueCapacityVector queueCapacityVector = > queue.getConfiguredCapacityVector(partitionName); > queueCapacityVectorInfo = queueCapacityVector == null ? > new QueueCapacityVectorInfo(new QueueCapacityVector()) : > new > QueueCapacityVectorInfo(queue.getConfiguredCapacityVector(partitionName)); > {code} > {code:java} > public Set<String> getExistingNodeLabels() { > readLock.lock(); > try { > return new HashSet<String>(capacitiesMap.keySet()); > } finally { > readLock.unlock(); > } > } > {code} > If the capacitiesMap contains entries that are not "configured", this will > result in an NPE, breaking the UI and the REST API: > {code:java} > INTERNAL_SERVER_ERROR > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacityVectorInfo.<init>(QueueCapacityVectorInfo.java:39) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.QueueCapacitiesInfo.<init>(QueueCapacitiesInfo.java:61) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.populateQueueCapacities(CapacitySchedulerLeafQueueInfo.java:108) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerQueueInfo.<init>(CapacitySchedulerQueueInfo.java:137) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerLeafQueueInfo.<init>(CapacitySchedulerLeafQueueInfo.java:66) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.getQueues(CapacitySchedulerInfo.java:197) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.CapacitySchedulerInfo.<init>(CapacitySchedulerInfo.java:94) > at > org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.getSchedulerInfo(RMWebServices.java:399) > {code} > There is no need to create capacityVectors for the unconfigured labels, so a > null check should solve this issue on the API side. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org