[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-05-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462772#comment-16462772
 ] 

Eric Yang commented on YARN-7973:
-

Addendum patch 001 committed to branch-3.1. 

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7973-branch-3.1.addendum.001.patch, 
> YARN-7973.001.patch, YARN-7973.002.patch, YARN-7973.003.patch, 
> YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-05-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462741#comment-16462741
 ] 

Eric Yang commented on YARN-7973:
-

[~jlowe] Yes, I am working on fixing this with addendum patch.  Sorry about the 
breakage.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-05-03 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462709#comment-16462709
 ] 

Jason Lowe commented on YARN-7973:
--

branch-3.1 is no longer building after this was committed:
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-yarn-server-nodemanager: Compilation failure
[ERROR] 
/home/jlowe/hadoop/apache/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java:[931,30]
 method getContainerStatus in class 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.docker.DockerCommandExecutor
 cannot be applied to given types;
[ERROR] required: 
java.lang.String,org.apache.hadoop.conf.Configuration,org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor,org.apache.hadoop.yarn.server.nodemanager.Context
[ERROR] found: 
java.lang.String,org.apache.hadoop.conf.Configuration,org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.privileged.PrivilegedOperationExecutor
[ERROR] reason: actual and formal argument lists differ in length
{noformat}

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-05-03 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462446#comment-16462446
 ] 

Shane Kumpf commented on YARN-7973:
---

[~eyang] - I realized after YARN-8194 that this isn't in branch-3.1. I think it 
should be as it fixes an issue introduced by YARN-5366 in the case of a 
RELAUNCH. YARN-5366 was committed in 3.1. Without this, RELAUNCH is broken when 
running Docker based Native Services. It looks like it applies to branch-3.1 
still, but let me know if you'd like me to put up a new patch for branch-3.1

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
>  Labels: Docker
> Fix For: 3.2.0
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-11 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433858#comment-16433858
 ] 

Shane Kumpf commented on YARN-7973:
---

Thanks for the commit and review [~eyang]. Also, thanks to [~billie.rinaldi] 
for the review.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Fix For: 3.2.0
>
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433188#comment-16433188
 ] 

Hudson commented on YARN-7973:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13962 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13962/])
YARN-7973. Added ContainerRelaunch feature for Docker containers.
(eyang: rev c467f311d0c7155c09052d93fac12045af925583)
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerCommandExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerRelaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/monitor/TestContainersMonitorResourceChange.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/DockerStartCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DefaultLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/TestDockerContainerRuntime.java
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/docker/TestDockerStartCommand.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/DefaultContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestLinuxContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DelegatingLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test_docker_util.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/docker-util.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/runtime/ContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/ContainerExecutor.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/JavaSandboxLinuxContainerRuntime.java
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerRelaunch.java


> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: 

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-05 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427285#comment-16427285
 ] 

Eric Yang commented on YARN-7973:
-

[~shaneku...@gmail.com] Yes, it make sense to fix #2 like problem in another 
JIRA as clean up.  We probably want to get YARN-7654 changes in first, then 
work on the clean up.  This will reduce the amount of overlaps that we would 
have to do.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-04 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425928#comment-16425928
 ] 

genericqa commented on YARN-7973:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
12s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
 7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
12m 18s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
11s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  6m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 13s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
45s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
41s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
33s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 35s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917568/YARN-7973.004.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux b8a7e05d5939 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 
19:38:41 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / b779f4f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-04 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425842#comment-16425842
 ] 

Shane Kumpf commented on YARN-7973:
---

Thanks for the review [~eyang]! I've attached a new patch that incorporates 
those suggestions.

Regarding suggestion #2 above for docker_util.cc, we may want to open another 
issue to make the other commands consistent as this pattern is used by rm, 
stop, and kill as well. Let me know if you think that's needed.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch, YARN-7973.004.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-04-03 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424599#comment-16424599
 ] 

Eric Yang commented on YARN-7973:
-

[~shaneku...@gmail.com] Thank you for the patch, I am able to run flex with 
patch 003.  A few small nitpicks:

1.  LaunchType, it would be nice to declare the constants outside of 
LinuxContainerExecutor.  It is possible that code may evolve and 
hadoop-yarn-services-core project (AM) needs to know those constants as well.  
One suggested location for constant is in hadoop-yarn-api project 
ApplicationConstants class is a good place to define constants.

2.  docker-util.c, the get_docker_start_command.  It would be easier to read 
without nesting if conditions, use goto for error handling:
{code}
  ret = add_to_buffer(out, outlen, DOCKER_START_COMMAND);
  if (ret == 0) {
ret = add_to_buffer(out, outlen, " ");
if (ret == 0) {
  ret = add_to_buffer(out, outlen, container_name);
}
free(container_name);
if (ret != 0) {
  return BUFFER_TOO_SMALL;
}
return 0;
  }
  free(container_name);
  return BUFFER_TOO_SMALL;
{code}

change to:

{code}
  ret = add_to_buffer(out, outlen, DOCKER_START_COMMAND);
  if (ret != 0) {
goto free_and_exit;
  }
  ret = add_to_buffer(out, outlen, " ");
  if (ret != 0) {
goto free_and_exit;
  }
  ret = add_to_buffer(out, outlen, container_name);
  if (ret != 0) {
goto free_and_exit;
  }
free_and_exit:
  free(container_name);
  return ret;
{code}

3. Instead of return 0, or check for == 0.  It would be nice to define 0 = 
SUCCESS or SUCCESS_EXIT_CODE to improve readability for ContainerLaunch class.


> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-31 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421331#comment-16421331
 ] 

genericqa commented on YARN-7973:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 19s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
24s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 25s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
19s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 74m 54s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b |
| JIRA Issue | YARN-7973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917099/YARN-7973.003.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 803d0ebfe4b1 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / acfd764 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/20176/testReport/ |
| Max. process+thread count | 341 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/20176/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-31 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421303#comment-16421303
 ] 

Shane Kumpf commented on YARN-7973:
---

Attaching a new patch that has been rebased to trunk

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch, 
> YARN-7973.003.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-30 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420505#comment-16420505
 ] 

Shane Kumpf commented on YARN-7973:
---

Thanks for trying out the patch [~eyang]!

{quote} Container relaunch is kind of working on my cluster using the example 
above.  If an app is stopped, and restarted, new containers would be acquired.  
If container fails, and the same one will be used for relaunch. {quote}
So it seems that there may be inconsistent use of the container relaunch policy 
in Native Services. That isn't really in scope for this patch, but sounds like 
something we should review in a separate issue. The only change in flow is when 
a container transitions to the relaunching state and Docker is in use, so this 
patch doesn't change how Native Services leverages that transition.

{quote}However, I encountered a problem where flexing containers from 2 to 3, 
then decrease back to 2.  The flexing command failed to be received by AM with 
the following error message{code}
I haven't been able to recreate this. Based on the exception type, it looks 
like the Services API may have been down? Can you share the RM and NM logs when 
this happens? I really wouldn't expect this patch to be related to that 
exception as it doesn't touch the Services API.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-26 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414543#comment-16414543
 ] 

Eric Yang commented on YARN-7973:
-

[~shaneku...@gmail.com] Thank you for the example.  Container relaunch is kind 
of working on my cluster using the example above.  If an app is stopped, and 
restarted, new containers would be acquired.  If container fails, and the same 
one will be used for relaunch.  However, I encountered a problem where flexing 
containers from 2 to 3, then decrease back to 2.  The flexing command failed to 
be received by AM with the following error message:

{code}
[hbase@eyang-5 hadoop-3.2.0-SNAPSHOT]$ ./bin/yarn app -flex z1 -component ping 2
2018-03-26 20:37:22,968 ERROR client.ApiServiceClient: Fail to flex 
application: 
com.sun.jersey.api.client.ClientHandlerException: java.net.ConnectException: 
Connection refused (Connection refused)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:155)
at com.sun.jersey.api.client.Client.handle(Client.java:652)
at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
at 
com.sun.jersey.api.client.WebResource$Builder.put(WebResource.java:539)
at 
org.apache.hadoop.yarn.service.client.ApiServiceClient.actionFlex(ApiServiceClient.java:417)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.run(ApplicationCLI.java:519)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at 
org.apache.hadoop.yarn.client.cli.ApplicationCLI.main(ApplicationCLI.java:111)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:204)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at 
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
at 
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
at 
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler$1$1.getOutputStream(URLConnectionClientHandler.java:238)
at 
com.sun.jersey.api.client.CommittingOutputStream.commitStream(CommittingOutputStream.java:117)
at 
com.sun.jersey.api.client.CommittingOutputStream.write(CommittingOutputStream.java:89)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush(BufferedWriter.java:254)
at 
com.sun.jersey.core.util.ReaderWriter.writeToAsString(ReaderWriter.java:191)
at 
com.sun.jersey.core.provider.AbstractMessageReaderWriterProvider.writeToAsString(AbstractMessageReaderWriterProvider.java:128)
at 
com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:88)
at 
com.sun.jersey.core.impl.provider.entity.StringProvider.writeTo(StringProvider.java:58)
at 
com.sun.jersey.api.client.RequestWriter.writeRequestEntity(RequestWriter.java:300)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(URLConnectionClientHandler.java:217)
at 
com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:153)
... 9 more
{code}

There is no error in AM logs.  The most recent 

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-26 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414452#comment-16414452
 ] 

Billie Rinaldi commented on YARN-7973:
--

I believe the Service AM is already utilizing container relaunch; this patch is 
to improve relaunch behavior in the case of Docker containers. In 
AbstractProviderService, setRetryContext is called, which initializes a 
ContainerRetryContext based on configurable parameters.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-26 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414451#comment-16414451
 ] 

Shane Kumpf commented on YARN-7973:
---

[~eyang] Thanks for taking a look. You can use the following yarnfile to 
trigger a relaunch.
{code:java}
{
   "name":"test-centos7",
   "version":"0.1",
   "lifetime":"3600",
   "components":[
  {
 "name":"centos7",
 "number_of_containers":1,
 "artifact":{
"id":"library/centos:7",
"type":"DOCKER"
 },
 "launch_command":"sleep 20; /bad_command",
 "resource":{
"cpus":2,
"memory":"1024"
 }
  }
   ]
}
{code}

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-26 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414447#comment-16414447
 ] 

Eric Yang commented on YARN-7973:
-

[~shaneku...@gmail.com] Thanks for the explanation.  Can we modify YARN Service 
AM to utilize the relaunch feature developed here?  I agree that option 1 is 
preferred for relaunch.  I like to see a way to trigger the relaunch, such as 
stop a service, and resume the service.  It is not easy to review the current 
patch when the rest of the connecting pieces are not aligned.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-26 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414336#comment-16414336
 ] 

Shane Kumpf commented on YARN-7973:
---

{quote}Sorry, I am not clear about the design of container relaunch feature. In 
what scenario is container relaunch used?
{quote}
Please see the existing {{ContainerRelaunch}} feature (YARN-3998) to better 
understand the initial design. This JIRA is for properly handling that feature 
with the Docker runtime. The {{ContainerRetryPolicy}} used by Native Services 
results in the use of this feature.
{quote}what would happen if the intermediate state of the container is 
preventing relaunch to run successfully?
{quote}
It is going to depend on your configuration. By default, Native Services 
relaunches every 30 seconds until the app lifetime is exceeded. This is the 
behavior with or without this patch. With a retry count set, the container will 
fail after relaunching the specified number of times.

How relaunch is used, is up to the application/AM, so we can't just look at how 
Native Services is using it, we need to fix relaunch for the Docker case.

As previously mentioned, IMO, we have two options:
 1) The approach taken here to call "docker start" on the existing container.
 2) Delete and launch a new Docker container with the same container ID name.

Given the design behind YARN-3998, #1 appears to be most appropriate. This may 
allow some applications to recover existing data, which I believe to be 
desirable.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-26 Thread Eric Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414306#comment-16414306
 ] 

Eric Yang commented on YARN-7973:
-

[~shaneku...@gmail.com] Sorry, I am not clear about the design of container 
relaunch feature.  In what scenario is container relaunch used?

At this time, if a container is finished or error out, a new instance of the 
component is launched to replace the existing one.  The new container will have 
a new container id.  I don't see code to help container relaunch on the same 
node of previous instance of the same container in the current patch.  In 
addition, what would happen if the intermediate state of the container is 
preventing relaunch to run successfully?

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16403775#comment-16403775
 ] 

Shane Kumpf commented on YARN-7973:
---

Opened YARN-8045, YARN-8043, and YARN-8044 for the items above. Let me know if 
you have any other concerns here.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-16 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16402005#comment-16402005
 ] 

Billie Rinaldi commented on YARN-7973:
--

Sounds good. Thanks for looking into this, [~shaneku...@gmail.com].

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-16 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16401818#comment-16401818
 ] 

Shane Kumpf commented on YARN-7973:
---

[~billie.rinaldi] - I looked into the issue you reported. The behavior you see 
occurs with or without this patch.

What you see above repeated over and over is the Diagnostics field being 
returned during the ContainerStatus calls. Pulling out only the Diagnostics 
field from above you get:
{code:java}
Diagnostics: [2018-03-08 22:02:53.397]Exception from container-launch.
Container id: container_1520546307703_0001_01_02
Exit code: -1
Exception message: 
Shell output: 

[2018-03-08 22:02:53.500]Diagnostic message from attempt 0 : [2018-03-08 
22:02:53.500]
[2018-03-08 22:02:53.501]Container exited with a non-zero exit code -1.
,{code}
You will see this repeated once per second until the relaunch occurs again (30 
seconds by default with native services). Once the relaunch occurs, you will 
see the exception that the relaunch failed, as the container isn't in a 
startable state. I could be convinced to call launchContainer in this case to 
produce the original error if you feel that is most appropriate, but I think 
there are alternative improvements to make here:
 * The logs are hard to follow with the diagnostics embedded in the log entry 
when returning the ContainerStatus. It looks like exceptions are repeated over 
and over, as you saw. We should consider moving this to debug logging.
 * Populate diagnostics with a better error in this case. The 
{{ContainerExecutionExecption}} thrown as part of this ACL check does not 
become part of the Diagnostics field.
 * Native Services currently uses {{ContainerRetryPolicy.RETRY_ON_ALL_ERRORS}} 
which may be too broad. -1 exit codes should likely be hard fails.

I'll open issues on these if that sounds good?

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-09 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393385#comment-16393385
 ] 

Shane Kumpf commented on YARN-7973:
---

Thanks for taking a look [~billie.rinaldi]! I believe I know the issue here, 
the container never started, so the {{docker start}} would fail. I need to add 
a check for non-existent state and just call {{launchContainer}} in that case. 
Although, I would expect a -1 exit code to be a hard fail vs a relaunch, so we 
should look at that as well.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-03-09 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393346#comment-16393346
 ] 

Billie Rinaldi commented on YARN-7973:
--

I started taking a look at patch 002. When I ran my first app, I had a 
configuration problem: I was trying to run a privileged container as a user 
that wasn't allowed to run privileged containers. The container failed with the 
appropriate message about the user failing the ACL check, but when it was 
relaunched the following was logged repeatedly. It seems like we could improve 
the failure handling in scenarios like this.
{noformat}
2018-03-08 22:02:53,791 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Getting container-status for container_1520546307703_0001_01_02
2018-03-08 22:02:53,791 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
 Returning ContainerStatus: [ContainerId: 
container_1520546307703_0001_01_02, ExecutionType: GUARANTEED, State: 
RUNNING, Capability: , Diagnostics: [2018-03-08 
22:02:53.397]Exception from container-launch.
Container id: container_1520546307703_0001_01_02
Exit code: -1
Exception message: 
Shell output: 

[2018-03-08 22:02:53.500]Diagnostic message from attempt 0 : [2018-03-08 
22:02:53.500]
[2018-03-08 22:02:53.501]Container exited with a non-zero exit code -1.
, ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
{noformat}

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-02-28 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380704#comment-16380704
 ] 

genericqa commented on YARN-7973:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
27s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 11s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 50s{color} | {color:green} patch has no errors when building and testing our 
client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
39s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 60m 38s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912456/YARN-7973.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux 87e1e1718e48 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / edc9f14 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19835/testReport/ |
| Max. process+thread count | 437 (vs. ulimit of 1) |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/19835/console |
| Powered by | Apache Yetus 0.8.0-SNAPSHOT   

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-02-28 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380588#comment-16380588
 ] 

Shane Kumpf commented on YARN-7973:
---

Attached a new patch to address the checkstyle issues.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-02-28 Thread genericqa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380392#comment-16380392
 ] 

genericqa commented on YARN-7973:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green}  
9m 48s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 16s{color} | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager:
 The patch generated 10 new + 132 unchanged - 0 fixed = 142 total (was 132) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 19s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 19m 
35s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 62m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | YARN-7973 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12912441/YARN-7973.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  cc  |
| uname | Linux f92187acf2bf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 
11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / e015e00 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-YARN-Build/19833/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/19833/testReport/ |
| Max. process+thread count | 407 (vs. ulimit 

[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-02-28 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380322#comment-16380322
 ] 

Shane Kumpf commented on YARN-7973:
---

Attaching a patch that adds a new {{relaunchContainer}} method to the 
ContainerExecutors and ContainerRuntimes. For all but the 
{{DockerLinuxContainerRuntime}}, {{relaunchContainer}} simply calls 
{{launchContainer}}, to mimic the existing behavior. In the case of 
{{DockerLinuxContainerRuntime}}, relaunch will instead call {{docker start}} on 
the existing container. For {{docker start}}, we require the same general flow 
as {{docker run}} where it is necessary to get the PID and wait for the process 
to exit. As a result, these two paths are the same through c-e, which appears 
to work well.

I've tested this against distributed shell, MR PI, MR sleep and several YARN 
Native Services apps - both process based and Docker, and tried to inject 
failures where appropriate. The testing looks good. I think we have opportunity 
to clean up some exception logging from the privileged executor, but I'll open 
a new issue to look into that clean up.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Assignee: Shane Kumpf
>Priority: Major
> Attachments: YARN-7973.001.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7973) Support ContainerRelaunch for Docker containers

2018-02-26 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376998#comment-16376998
 ] 

Shane Kumpf commented on YARN-7973:
---

I think we have a couple options:
 # Restore the previous behavior. Remove the container prior to relaunch and 
launch a new container with the same name.
 # Use {{docker start}} to try to start the existing Docker container.

IMO, #2 is the more appropriate fix given the intent of {{ContainerRelaunch}}. 
This has the added benefit of leaving the root filesystem in the container in 
tact, which would enable the application to recover its data during relaunch. 
I've started on a patch to handle this and will take ownership.

> Support ContainerRelaunch for Docker containers
> ---
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Shane Kumpf
>Priority: Major
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container 
> when it exited. The removal is now handled by the 
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse 
> the workdir from the previous attempt, and does not call {{cleanupContainer}} 
> prior to {{launchContainer}}. The container ID is reused as well. As a 
> result, the previous Docker container still exists, resulting in an error 
> from Docker indicating the a container by that name already exists.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org