[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380322#comment-16380322
 ] 

Shane Kumpf commented on YARN-7973:
-----------------------------------

Attaching a patch that adds a new {{relaunchContainer}} method to the 
ContainerExecutors and ContainerRuntimes. For all but the 
{{DockerLinuxContainerRuntime}}, {{relaunchContainer}} simply calls 
{{launchContainer}}, to mimic the existing behavior. In the case of 
{{DockerLinuxContainerRuntime}}, relaunch will instead call {{docker start}} on 
the existing container. For {{docker start}}, we require the same general flow 
as {{docker run}} where it is necessary to get the PID and wait for the process 
to exit. As a result, these two paths are the same through c-e, which appears 
to work well.

I've tested this against distributed shell, MR PI, MR sleep and several YARN 
Native Services apps - both process based and Docker, and tried to inject 
failures where appropriate. The testing looks good. I think we have opportunity 
to clean up some exception logging from the privileged executor, but I'll open 
a new issue to look into that clean up.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to