[ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393346#comment-16393346 ]
Billie Rinaldi commented on YARN-7973: -------------------------------------- I started taking a look at patch 002. When I ran my first app, I had a configuration problem: I was trying to run a privileged container as a user that wasn't allowed to run privileged containers. The container failed with the appropriate message about the user failing the ACL check, but when it was relaunched the following was logged repeatedly. It seems like we could improve the failure handling in scenarios like this. {noformat} 2018-03-08 22:02:53,791 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Getting container-status for container_1520546307703_0001_01_000002 2018-03-08 22:02:53,791 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Returning ContainerStatus: [ContainerId: container_1520546307703_0001_01_000002, ExecutionType: GUARANTEED, State: RUNNING, Capability: <memory:1024, vCores:1>, Diagnostics: [2018-03-08 22:02:53.397]Exception from container-launch. Container id: container_1520546307703_0001_01_000002 Exit code: -1 Exception message: <unknown> Shell output: <unknown> [2018-03-08 22:02:53.500]Diagnostic message from attempt 0 : [2018-03-08 22:02:53.500] [2018-03-08 22:02:53.501]Container exited with a non-zero exit code -1. , ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED] {noformat} > Support ContainerRelaunch for Docker containers > ----------------------------------------------- > > Key: YARN-7973 > URL: https://issues.apache.org/jira/browse/YARN-7973 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Major > Attachments: YARN-7973.001.patch, YARN-7973.002.patch > > > Prior to YARN-5366, {{container-executor}} would remove the Docker container > when it exited. The removal is now handled by the > {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse > the workdir from the previous attempt, and does not call {{cleanupContainer}} > prior to {{launchContainer}}. The container ID is reused as well. As a > result, the previous Docker container still exists, resulting in an error > from Docker indicating the a container by that name already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org