[ https://issues.apache.org/jira/browse/YARN-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550894#comment-14550894 ]
zhihai xu commented on YARN-3619: --------------------------------- I uploaded a patch YARN-3619.000.patch for review. I added a configuration NM_CONTAINER_METRICS_UNREGISTER_DELAY_MS to configure when to unregister the container metrics after it is finished. Because it may have potential memory leak If I schedule a thread to do unregistration at getMetrics. It looks like getMetrics will be called from two places:MetricsSystemImpl#sampleMetrics and MetricsSourceAdapter#getMBeanInfo. sampleMetrics won't be called if no sinks in MetricsSystemImpl. getMBeanInfo may not be called after registration if JMXJsonServlet#doGet is not called(no http Get request from JMX clients). It looks like there is a possibility that getMetrics won't be called after registration. > ContainerMetrics unregisters during getMetrics and leads to > ConcurrentModificationException > ------------------------------------------------------------------------------------------- > > Key: YARN-3619 > URL: https://issues.apache.org/jira/browse/YARN-3619 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.7.0 > Reporter: Jason Lowe > Assignee: zhihai xu > Attachments: YARN-3619.000.patch, test.patch > > > ContainerMetrics is able to unregister itself during the getMetrics method, > but that method can be called by MetricsSystemImpl.sampleMetrics which is > trying to iterate the sources. This leads to a > ConcurrentModificationException log like this: > {noformat} > 2015-05-11 14:00:20,360 [Timer for 'NodeManager' metrics system] WARN > impl.MetricsSystemImpl: java.util.ConcurrentModificationException > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)