[ https://issues.apache.org/jira/browse/YARN-9088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17679248#comment-17679248 ]
C.J. Collier commented on YARN-9088: ------------------------------------ I'll review the changes and see if I can pick up where karthikpal left off. Here is a list of the files changed in that other patch ordered by number of changes to the file. hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/\ scheduler/QueueMetrics.java scheduler/AppSchedulingInfo.java scheduler/TestQueueMetrics.java scheduler/capacity/CSQueueMetrics.java scheduler/common/fica/FiCaSchedulerApp.java scheduler/fair/FSAppAttempt.java scheduler/capacity/LeafQueue.java scheduler/SchedulerApplicationAttempt.java scheduler/capacity/CSQueueUtils.java scheduler/capacity/TestNodeLabelContainerAllocation.java scheduler/TestSchedulerApplicationAttempt.java scheduler/capacity/TestCapacityScheduler.java monitor/invariants/TestMetricsInvariantChecker.java scheduler/fair/FairScheduler.java > Non-exclusive labels break QueueMetrics > --------------------------------------- > > Key: YARN-9088 > URL: https://issues.apache.org/jira/browse/YARN-9088 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager > Affects Versions: 2.8.5 > Reporter: Brandon Scheller > Priority: Major > Labels: metrics, nodelabel > > QueueMetrics are broken (random/negative values) when non-exclusive labels > are being used and unlabeled containers run on labeled nodes. > This is caused by the change in the patch here: > https://issues.apache.org/jira/browse/YARN-6467 > It assumes that a container's label will be the same as the node's label that > it is running on. > If you look within the patch, sometimes metrics are updated using the > request.getNodeLabelExpression(). And sometimes they are updated using > node.getPartition(). > This means that in the case where the node is labeled while the container > request isn't, these metrics only get updated when referring to the default > queue. This stops metrics from balancing out and results in incorrect and > negative values in QueueMetrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org