[ https://issues.apache.org/jira/browse/YARN-9486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Yang updated YARN-9486: ---------------------------- Attachment: YARN-9486.005.patch > Docker container exited with failure does not get clean up correctly > -------------------------------------------------------------------- > > Key: YARN-9486 > URL: https://issues.apache.org/jira/browse/YARN-9486 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 3.2.0 > Reporter: Eric Yang > Assignee: Eric Yang > Priority: Major > Attachments: YARN-9486.001.patch, YARN-9486.002.patch, > YARN-9486.003.patch, YARN-9486.004.patch, YARN-9486.005.patch > > > When docker container encounters error and exit prematurely > (EXITED_WITH_FAILURE), ContainerCleanup does not remove container, instead we > get messages that look like this: > {code} > java.io.IOException: Could not find > nmPrivate/application_1555111445937_0008/container_1555111445937_0008_01_000007//container_1555111445937_0008_01_000007.pid > in any of the directories > 2019-04-15 20:42:16,454 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_000007 transitioned from > RELAUNCHING to EXITED_WITH_FAILURE > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Cleaning up container container_1555111445937_0008_01_000007 > 2019-04-15 20:42:16,455 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerCleanup: > Container container_1555111445937_0008_01_000007 not launched. No cleanup > needed to be done > 2019-04-15 20:42:16,455 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=hbase > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1555111445937_0008 > CONTAINERID=container_1555111445937_0008_01_000007 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1555111445937_0008_01_000007 transitioned from > EXITED_WITH_FAILURE to DONE > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1555111445937_0008_01_000007 from application > application_1555111445937_0008 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Stopping resource-monitoring for container_1555111445937_0008_01_000007 > 2019-04-15 20:42:16,458 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: > Considering container container_1555111445937_0008_01_000007 for > log-aggregation > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting container-status for container_1555111445937_0008_01_000007 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Getting localization status for container_1555111445937_0008_01_000007 > 2019-04-15 20:42:16,804 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Returning ContainerStatus: [ContainerId: > container_1555111445937_0008_01_000007, ExecutionType: GUARANTEED, State: > COMPLETE, Capability: <memory:1024, vCores:1>, Diagnostics: ..., ExitStatus: > -1, IP: null, Host: null, ExposedPorts: , ContainerSubState: DONE] > 2019-04-15 20:42:18,464 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed containers from NM context: [container_1555111445937_0008_01_000007] > 2019-04-15 20:43:50,476 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Stopping container with container Id: container_1555111445937_0008_01_000007 > {code} > There is no docker rm command performed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org