Tamas Domok created YARN-11490:
----------------------------------

             Summary: JMX QueueMetrics breaks after mutable config validation 
in CS
                 Key: YARN-11490
                 URL: https://issues.apache.org/jira/browse/YARN-11490
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 3.4.0
            Reporter: Tamas Domok
            Assignee: Tamas Domok


Reproduction steps:

1. Submit a long running job
{code}
hadoop-3.4.0-SNAPSHOT/bin/yarn jar 
hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.4.0-SNAPSHOT-tests.jar
 sleep -m 1 -r 1 -rt 1200000 -mt 20
{code}

2. Verify that there is one running app
{code}
$ curl http://localhost:8088/ws/v1/cluster/metrics | jq
{code}

3. Verify that the JMX endpoint reports 1 running app as well
{code}
$ curl http://localhost:8088/jmx | jq
{code}

4. Validate the configuration
{code}
$ curl -X POST -H 'Content-Type: application/json' -d @defaultqueue.json 
localhost:8088/ws/v1/cluster/scheduler-conf/validate

$ cat defaultqueue.json
{"update-queue":{"queue-name":"root.default","params":{"entry":{"key":"maximum-applications","value":"100"}}},"subClusterId":"","global":null,"global-updates":null}
{code}

5. Check 2. and 3. again. The cluster metrics should still work but the JMX 
endpoint will show 0 running apps, that's the bug.


It is caused by YARN-11211, reverting that patch (or only removing the 
_QueueMetrics.clearQueueMetrics();_ line) fixes the issue. But I think that 
would re-introduce the memory leak.


It looks like the QUEUE_METRICS hash map is "add-only", the clearQueueMetrics() 
was only called from ResourceManager.reinitialize() method 
(transitionToActive/transitionToStandby) prior to YARN-11211. Constantly adding 
and removing queues with unique names would cause a leak as well, because there 
is no remove from QUEUE_METRICS, so it is not just the validation API that has 
this problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to