[ https://issues.apache.org/jira/browse/YARN-8035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shane Kumpf updated YARN-8035: ------------------------------ Attachment: YARN-8035.002.patch > Uncaught exception in ContainersMonitorImpl during relaunch due to the > process ID changing > ------------------------------------------------------------------------------------------ > > Key: YARN-8035 > URL: https://issues.apache.org/jira/browse/YARN-8035 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Major > Attachments: YARN-8035.001.patch, YARN-8035.002.patch > > > In the case of a container relaunch event, the container ID is reused but a > new process is spawned. For resource monitoring, {{ContainersMonitorImpl}} > will obtain the new PID post relaunch and initialize the process tree > monitoring. As part of this initialization, a tag called {{ContainerPid}}, > whose value is the PID for the container, is populated for the metrics > associated with the container. If the prior container failed after its > process started, the original PID will already be populated for the > container, resulting in the {{MetricsException}} below. > {code:java} > 2018-03-16 11:59:02,563 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Uncaught exception in ContainersMonitorImpl while monitoring resource of > container_1521201379995_0001_01_000002 > org.apache.hadoop.metrics2.MetricsException: Tag ContainerPid already exists! > at > org.apache.hadoop.metrics2.lib.MetricsRegistry.checkTagName(MetricsRegistry.java:433) > at > org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:394) > at > org.apache.hadoop.metrics2.lib.MetricsRegistry.tag(MetricsRegistry.java:400) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerMetrics.recordProcessId(ContainerMetrics.java:277) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.initializeProcessTrees(ContainersMonitorImpl.java:559) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:448){code} > {{MetricsRegistry}} provides a {{tag}} method that allows for updating the > value of an existing tag. Updating the value ensures that the PID associated > with container is the currently running process, which appears to be an > appropriate fix. However, it's unclear how this tag might be being used by > other systems. I'm not finding any usage in Hadoop itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org