[ https://issues.apache.org/jira/browse/YARN-10517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308158#comment-17308158 ]
Eric Payne commented on YARN-10517: ----------------------------------- I'll try to look this afternoon. > QueueMetrics has incorrect Allocated Resource when labelled partitions updated > ------------------------------------------------------------------------------ > > Key: YARN-10517 > URL: https://issues.apache.org/jira/browse/YARN-10517 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.8.0, 3.3.0 > Reporter: sibyl.lv > Assignee: Qi Zhu > Priority: Major > Attachments: YARN-10517-branch-3.2.001.patch, YARN-10517.001.patch, > wrong metrics.png > > > After https://issues.apache.org/jira/browse/YARN-9596, QueueMetrics still has > incorrect allocated jmx, such as {color:#660e7a}allocatedMB, > {color}{color:#660e7a}allocatedVCores and > {color}{color:#660e7a}allocatedContainers, {color}when the node partition is > updated from "DEFAULT" to other label and there are running applications. > Steps to reproduce > ============== > # Configure capacity-scheduler.xml with label configuration > # Submit one application to default partition and run > # Add label "tpcds" to cluster and replace label on node1 and node2 to be > "tpcds" when the above application is running > # Note down "VCores Used" at Web UI > # When the application is finished, the metrics get wrong (screenshots > attached). > ============== > > FiCaSchedulerApp doesn't update queue metrics when CapacityScheduler handles > this event {color:#660e7a}NODE_LABELS_UPDATE.{color} > So we should release container resource from old partition and add used > resource to new partition, just as updating queueUsage. > {code:java} > // code placeholder > public void nodePartitionUpdated(RMContainer rmContainer, String oldPartition, > String newPartition) { > Resource containerResource = rmContainer.getAllocatedResource(); > this.attemptResourceUsage.decUsed(oldPartition, containerResource); > this.attemptResourceUsage.incUsed(newPartition, containerResource); > getCSLeafQueue().decUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incUsedResource(newPartition, containerResource, this); > // Update new partition name if container is AM and also update AM resource > if (rmContainer.isAMContainer()) { > setAppAMNodePartitionName(newPartition); > this.attemptResourceUsage.decAMUsed(oldPartition, containerResource); > this.attemptResourceUsage.incAMUsed(newPartition, containerResource); > getCSLeafQueue().decAMUsedResource(oldPartition, containerResource, this); > getCSLeafQueue().incAMUsedResource(newPartition, containerResource, this); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org