[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420505#comment-16420505
 ] 

Shane Kumpf commented on YARN-7973:
-----------------------------------

Thanks for trying out the patch [~eyang]!

{quote} Container relaunch is kind of working on my cluster using the example 
above.  If an app is stopped, and restarted, new containers would be acquired.  
If container fails, and the same one will be used for relaunch. {quote}
So it seems that there may be inconsistent use of the container relaunch policy 
in Native Services. That isn't really in scope for this patch, but sounds like 
something we should review in a separate issue. The only change in flow is when 
a container transitions to the relaunching state and Docker is in use, so this 
patch doesn't change how Native Services leverages that transition.

{quote}However, I encountered a problem where flexing containers from 2 to 3, 
then decrease back to 2.  The flexing command failed to be received by AM with 
the following error message{code}
I haven't been able to recreate this. Based on the exception type, it looks 
like the Services API may have been down? Can you share the RM and NM logs when 
this happens? I really wouldn't expect this patch to be related to that 
exception as it doesn't touch the Services API.

> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
>                 Key: YARN-7973
>                 URL: https://issues.apache.org/jira/browse/YARN-7973
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Shane Kumpf
>            Assignee: Shane Kumpf
>            Priority: Major
>         Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to