[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441108#comment-16441108 ] Eric Badger commented on YARN-7189: --- Thanks for the review and commit, [~jlowe]! > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 2.10.0, 2.9.2, 3.0.3 > > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch, > YARN-7189-branch-3.0.003.patch, YARN-7189-branch-3.0.004.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439556#comment-16439556 ] Jason Lowe commented on YARN-7189: -- Thanks for updating the patch! +1 lgtm. I'll commit this later today if there are no objections. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch, > YARN-7189-branch-3.0.003.patch, YARN-7189-branch-3.0.004.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438003#comment-16438003 ] genericqa commented on YARN-7189: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 37s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 24m 2s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 49s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 22s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 54m 16s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5aaf88d | | JIRA Issue | YARN-7189 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919010/YARN-7189-branch-3.0.004.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 3301d48b9e8d 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.0 / dc01e32 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20344/testReport/ | | Max. process+thread count | 397 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20344/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch, > YARN-7189-branch
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437990#comment-16437990 ] genericqa commented on YARN-7189: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 4s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 29m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 7s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 51s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5aaf88d | | JIRA Issue | YARN-7189 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12919001/YARN-7189-branch-3.0.003.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux c10a86f0c0ed 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.0 / dc01e32 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20343/testReport/ | | Max. process+thread count | 410 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20343/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch, > YARN-7189-branch
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437908#comment-16437908 ] Eric Badger commented on YARN-7189: --- Sorry for the patch overload. Found a few nits right after I put up the patch. Patch 004 fixes those. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch, > YARN-7189-branch-3.0.003.patch, YARN-7189-branch-3.0.004.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437850#comment-16437850 ] Eric Badger commented on YARN-7189: --- Thanks for the review, [~jlowe]! Fixed up the patch to use the suggestions that you mentioned above. Again, tested it manually and it works in both the case where the removal works initially and when it fails initially and has to retry. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch, > YARN-7189-branch-3.0.003.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436358#comment-16436358 ] Jason Lowe commented on YARN-7189: -- Thanks for updating the patch! I agree this looks difficult to unit test directly. When pclose does not return 0 we cannot blindly assume errno is set. It is only set when pclose returns -1. Otherwise it returns the exit status which could just be a non-zero exit code from docker. That also reminds me that we technically should be using the WIFEXITED and WEXITSTATUS macros to examine the resulting status when it isn't -1 to look for a successful exit from the subcommand. Nit: it would be nice to be consistent about the spacing between the {{if}} and the parentheses. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16435835#comment-16435835 ] Eric Badger commented on YARN-7189: --- The TestDockerContainerRuntime failures are unrelated to this patch and should be fixed by the backporting of YARN-7810 to branch-3.0. I'm not sure how to write a test for this. I've manually tested this by running a cluster with invalid {{mapreduce.map.java.opts}} and seeing that all containers are cleaned up with the patch (and not cleaned up without it). If anyone has suggestions, I'd be happy to implement. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434650#comment-16434650 ] genericqa commented on YARN-7189: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 41s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 28s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 27m 47s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 16m 58s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 59m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5aaf88d | | JIRA Issue | YARN-7189 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918618/YARN-7189-branch-3.0.002.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux 1757ad56d177 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.0 / 5fe2b97 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20309/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20309/testReport/ | | Max. process+thread count | 303 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20309/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Typ
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434466#comment-16434466 ] Eric Badger commented on YARN-7189: --- [~jlowe], new patch cleans some things up. Looping over pclose() makes things a little bit weird because of the required popen(). Let me know if this is acceptable. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, > YARN-7189-branch-3.0.001.patch, YARN-7189-branch-3.0.002.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434063#comment-16434063 ] Jason Lowe commented on YARN-7189: -- Thanks for the patch! The {{i < 5}} check is extraneous and would never be triggered because the body of the loop is checking it and will be the termination condition instead. Actually I think the loop would be simpler if written as a while loop, e.g.: while ((rc = pclose(..)) != 0). Nit: The {{continue}} in the for loop is extraneous as is the {{goto}}. It may be useful to log errors from pclose (i.e.: pclose returning -1) along with strerror(errno) when that happens. Nit: "Could not remove container after 5 tries %s.\n" should be "Could not remove container after 5 tries: %s\n" so the command is clearly separated from the error description and we don't inject a trailing period into the cmdline printed. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, YARN-7189-branch-3.0.001.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434057#comment-16434057 ] genericqa commented on YARN-7189: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 56s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-3.0 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 56s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} branch-3.0 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 37m 39s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 3s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 56s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 71m 52s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.linux.runtime.TestDockerContainerRuntime | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5aaf88d | | JIRA Issue | YARN-7189 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918565/YARN-7189-branch-3.0.001.patch | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux ae80e40479f9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | branch-3.0 / 7cca348 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | unit | https://builds.apache.org/job/PreCommit-YARN-Build/20303/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20303/testReport/ | | Max. process+thread count | 302 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20303/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Typ
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433936#comment-16433936 ] Eric Badger commented on YARN-7189: --- Attaching patch with correct naming convention to hopefully get genericqa to run it against branch-3.0 > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch, YARN-7189-branch-3.0.001.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433932#comment-16433932 ] genericqa commented on YARN-7189: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 8s{color} | {color:red} YARN-7189 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | YARN-7189 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12918481/YARN-7189-b3.0.001.patch | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20302/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7189) Container-executor doesn't remove Docker containers that error out early
[ https://issues.apache.org/jira/browse/YARN-7189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433137#comment-16433137 ] Eric Badger commented on YARN-7189: --- Attaching first patch to fix this issue. There is a race in the removal of the docker container where the pid may not be valid anymore (no such process), but the docker container is still in the running state. Because of that, I have added an exponential backoff of removal in this patch. It will try for 5 iterations of increasing sleep times and eventually give up after the last one. > Container-executor doesn't remove Docker containers that error out early > > > Key: YARN-7189 > URL: https://issues.apache.org/jira/browse/YARN-7189 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Affects Versions: 2.9.0, 2.8.3, 3.0.1 >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Attachments: YARN-7189-b3.0.001.patch > > > Once the docker run command is executed, the docker container is created > unless the return code is 125 meaning that the run command itself failed > (https://docs.docker.com/engine/reference/run/#exit-status). Any error that > happens after the docker run needs to remove the container during cleanup. > {noformat:title=container-executor.c:launch_docker_container_as_user} > snprintf(docker_command_with_binary, command_size, "%s %s", docker_binary, > docker_command); > fprintf(LOGFILE, "Launching docker container...\n"); > FILE* start_docker = popen(docker_command_with_binary, "r"); > {noformat} > This is fixed by YARN-5366, which changes how we remove containers. However, > that was committed into 3.1.0. 2.8, 2.9, and 3.0 are all affected -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org