[ https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062390#comment-17062390 ]
Anuj commented on YARN-9088: ---------------------------- We are in our setup facing similar issue in which global view of pending and available resource is get messed up. > Non-exclusive labels break QueueMetrics > --------------------------------------- > > Key: YARN-9088 > URL: https://issues.apache.org/jira/browse/YARN-9088 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager > Affects Versions: 2.8.5 > Reporter: Brandon Scheller > Priority: Major > Labels: metrics, nodelabel > > QueueMetrics are broken (random/negative values) when non-exclusive labels > are being used and unlabeled containers run on labeled nodes. > This is caused by the change in the patch here: > https://issues.apache.org/jira/browse/YARN-6467 > It assumes that a container's label will be the same as the node's label that > it is running on. > If you look within the patch, sometimes metrics are updated using the > request.getNodeLabelExpression(). And sometimes they are updated using > node.getPartition(). > This means that in the case where the node is labeled while the container > request isn't, these metrics only get updated when referring to the default > queue. This stops metrics from balancing out and results in incorrect and > negative values in QueueMetrics. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org