Junping Du created YARN-5190:
--------------------------------

             Summary: Race condition in registering container metrics cause 
uncaught exception in ContainerMonitorImpl
                 Key: YARN-5190
                 URL: https://issues.apache.org/jira/browse/YARN-5190
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Junping Du
            Assignee: Junping Du
            Priority: Critical


The exception stack is as following:
{noformat}
310735 2016-05-22 01:50:04,554 [Container Monitor] ERROR 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Container 
Monitor,5,main] threw an Exception.
310736 org.apache.hadoop.metrics2.MetricsException: Metrics source 
ContainerResource_container_1463840817638_14484_01_000010 already exists!
310737         at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
310738         at 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
310739         at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
310740         at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:212)
310741         at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.forContainer(ContainerMetrics.java:198)
310742         at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:385)
{noformat}

After YARN-4906, we have multiple places to get ContainerMetrics for a 
particular container that could cause race condition in registering the same 
container metrics to DefaultMetricsSystem by different threads. Lacking of 
proper handling of MetricsException which could get thrown, the exception will 
could bring down daemon of ContainerMonitorImpl or even whole NM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to