[ https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420505#comment-16420505 ]
Shane Kumpf commented on YARN-7973: ----------------------------------- Thanks for trying out the patch [~eyang]! {quote} Container relaunch is kind of working on my cluster using the example above. If an app is stopped, and restarted, new containers would be acquired. If container fails, and the same one will be used for relaunch. {quote} So it seems that there may be inconsistent use of the container relaunch policy in Native Services. That isn't really in scope for this patch, but sounds like something we should review in a separate issue. The only change in flow is when a container transitions to the relaunching state and Docker is in use, so this patch doesn't change how Native Services leverages that transition. {quote}However, I encountered a problem where flexing containers from 2 to 3, then decrease back to 2. The flexing command failed to be received by AM with the following error message{code} I haven't been able to recreate this. Based on the exception type, it looks like the Services API may have been down? Can you share the RM and NM logs when this happens? I really wouldn't expect this patch to be related to that exception as it doesn't touch the Services API. > Support ContainerRelaunch for Docker containers > ----------------------------------------------- > > Key: YARN-7973 > URL: https://issues.apache.org/jira/browse/YARN-7973 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Major > Attachments: YARN-7973.001.patch, YARN-7973.002.patch > > > Prior to YARN-5366, {{container-executor}} would remove the Docker container > when it exited. The removal is now handled by the > {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse > the workdir from the previous attempt, and does not call {{cleanupContainer}} > prior to {{launchContainer}}. The container ID is reused as well. As a > result, the previous Docker container still exists, resulting in an error > from Docker indicating the a container by that name already exists. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org