Tamas Domok created YARN-11490: ---------------------------------- Summary: JMX QueueMetrics breaks after mutable config validation in CS Key: YARN-11490 URL: https://issues.apache.org/jira/browse/YARN-11490 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.4.0 Reporter: Tamas Domok Assignee: Tamas Domok
Reproduction steps: 1. Submit a long running job {code} hadoop-3.4.0-SNAPSHOT/bin/yarn jar hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar sleep -m 1 -r 1 -rt 1200000 -mt 20 {code} 2. Verify that there is one running app {code} $ curl http://localhost:8088/ws/v1/cluster/metrics | jq {code} 3. Verify that the JMX endpoint reports 1 running app as well {code} $ curl http://localhost:8088/jmx | jq {code} 4. Validate the configuration {code} $ curl -X POST -H 'Content-Type: application/json' -d @defaultqueue.json localhost:8088/ws/v1/cluster/scheduler-conf/validate $ cat defaultqueue.json {"update-queue":{"queue-name":"root.default","params":{"entry":{"key":"maximum-applications","value":"100"}}},"subClusterId":"","global":null,"global-updates":null} {code} 5. Check 2. and 3. again. The cluster metrics should still work but the JMX endpoint will show 0 running apps, that's the bug. It is caused by YARN-11211, reverting that patch (or only removing the _QueueMetrics.clearQueueMetrics();_ line) fixes the issue. But I think that would re-introduce the memory leak. It looks like the QUEUE_METRICS hash map is "add-only", the clearQueueMetrics() was only called from ResourceManager.reinitialize() method (transitionToActive/transitionToStandby) prior to YARN-11211. Constantly adding and removing queues with unique names would cause a leak as well, because there is no remove from QUEUE_METRICS, so it is not just the validation API that has this problem. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org