[ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414336#comment-16414336 ]
Shane Kumpf commented on YARN-7973: ----------------------------------- {quote}Sorry, I am not clear about the design of container relaunch feature. In what scenario is container relaunch used? {quote} Please see the existing {{ContainerRelaunch}} feature (YARN-3998) to better understand the initial design. This JIRA is for properly handling that feature with the Docker runtime. The {{ContainerRetryPolicy}} used by Native Services results in the use of this feature. {quote}what would happen if the intermediate state of the container is preventing relaunch to run successfully? {quote} It is going to depend on your configuration. By default, Native Services relaunches every 30 seconds until the app lifetime is exceeded. This is the behavior with or without this patch. With a retry count set, the container will fail after relaunching the specified number of times. How relaunch is used, is up to the application/AM, so we can't just look at how Native Services is using it, we need to fix relaunch for the Docker case. As previously mentioned, IMO, we have two options: 1) The approach taken here to call "docker start" on the existing container. 2) Delete and launch a new Docker container with the same container ID name. Given the design behind YARN-3998, #1 appears to be most appropriate. This may allow some applications to recover existing data, which I believe to be desirable. > Support ContainerRelaunch for Docker containers > ----------------------------------------------- > > Key: YARN-7973 > URL: https://issues.apache.org/jira/browse/YARN-7973 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Major > Attachments: YARN-7973.001.patch, YARN-7973.002.patch > > > Prior to YARN-5366, {{container-executor}} would remove the Docker container > when it exited. The removal is now handled by the > {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse > the workdir from the previous attempt, and does not call {{cleanupContainer}} > prior to {{launchContainer}}. The container ID is reused as well. As a > result, the previous Docker container still exists, resulting in an error > from Docker indicating the a container by that name already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org