[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414336#comment-16414336
 ] 

Shane Kumpf commented on YARN-7973:
-----------------------------------

{quote}Sorry, I am not clear about the design of container relaunch feature. In 
what scenario is container relaunch used?
{quote}
Please see the existing {{ContainerRelaunch}} feature (YARN-3998) to better 
understand the initial design. This JIRA is for properly handling that feature 
with the Docker runtime. The {{ContainerRetryPolicy}} used by Native Services 
results in the use of this feature.
{quote}what would happen if the intermediate state of the container is 
preventing relaunch to run successfully?
{quote}
It is going to depend on your configuration. By default, Native Services 
relaunches every 30 seconds until the app lifetime is exceeded. This is the 
behavior with or without this patch. With a retry count set, the container will 
fail after relaunching the specified number of times.

How relaunch is used, is up to the application/AM, so we can't just look at how 
Native Services is using it, we need to fix relaunch for the Docker case.

As previously mentioned, IMO, we have two options:
 1) The approach taken here to call "docker start" on the existing container.
 2) Delete and launch a new Docker container with the same container ID name.

Given the design behind YARN-3998, #1 appears to be most appropriate. This may 
allow some applications to recover existing data, which I believe to be 
desirable.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to