[ https://issues.apache.org/jira/browse/YARN-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shilun Fan updated YARN-11152: ------------------------------ Target Version/s: 3.4.0 Affects Version/s: 3.4.0 > QueueMetrics is leaking memory when creating a new queue during > reinitialisation > -------------------------------------------------------------------------------- > > Key: YARN-11152 > URL: https://issues.apache.org/jira/browse/YARN-11152 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 3.4.0 > Reporter: András Győri > Assignee: András Győri > Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Capacity Scheduler handles reinitialisation by reparsing the entire queue > hierarchy, then reinitialising the old queue hierarchy by taking the newly > parsed queues into account. After this, the newly parsed queues are discarded > and they are GCed. > However, with the introduction of YARN-6492, we are storing a parent queue in > QueueMetrics, which is problematic, because at that point, the parent queue > could still point to a parent reference, that is a newly parsed parent queue > (which should be discarded after the reinitialisation). Due to this fact, > QueueMetrics could contain parents members of an entirely different queue > hierarchy than the current hierarchy in use. It could lead to subtle problems > as well as memory leak, because one parent reference will keep the whole > queue hierarchy alive. > This problem arised when we programatically added one queue after an other > via the mutation API, thus keeping alive hundreds of queue hierarchies at the > same time, crippling the GC and the whole RM. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org