[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645338#comment-16645338 ] Chandni Singh commented on YARN-7644: - Thanks [~jlowe], [~ebadger] & [~eyang] > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch, > YARN-7644.006.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645150#comment-16645150 ] Hudson commented on YARN-7644: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #15167 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/15167/]) YARN-7644. NM gets backed up deleting docker containers. Contributed by (jlowe: rev 5ce70e1211e624d58e8bb1181aec00729ebdc085) * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainerCleanup.java * (add) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerCleanup.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/TestContainersLauncher.java > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch, > YARN-7644.006.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645096#comment-16645096 ] Jason Lowe commented on YARN-7644: -- Thanks for updating the patch! +1 for patch v6 as well. Committing this. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch, > YARN-7644.006.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644263#comment-16644263 ] Hadoop QA commented on YARN-7644: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 11s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 118 unchanged - 10 fixed = 122 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 2s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 29s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 72m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943131/YARN-7644.006.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 61f37bac224f 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 6a39739 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/22123/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22123/testReport/ | | Max. process+thread count | 307 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644179#comment-16644179 ] Chandni Singh commented on YARN-7644: - Introduced a checkstyle warning in patch 5. Fixed it in patch 6. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch, > YARN-7644.006.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644174#comment-16644174 ] Hadoop QA commented on YARN-7644: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 20m 19s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 5 new + 119 unchanged - 10 fixed = 124 total (was 129) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 57s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 57s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 34s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 74m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943109/YARN-7644.005.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux d30456ce4b86 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5b7ba48 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/22119/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22119/testReport/ | | Max. process+thread count | 340 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644125#comment-16644125 ] Jason Lowe commented on YARN-7644: -- Thanks for updating the patch! +1 for patch 5 pending Jenkins. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644085#comment-16644085 ] Chandni Singh commented on YARN-7644: - Made {{reapContainer}} package-private as well in Patch 5. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch, YARN-7644.005.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644080#comment-16644080 ] Chandni Singh commented on YARN-7644: - Addressed the review comments in Patch 4. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch, YARN-7644.004.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16644044#comment-16644044 ] Jason Lowe commented on YARN-7644: -- Thanks for updating the patch! compareAndSetAlreadyLaunched is too explicit -- it essentially exposes the AtomicBoolean directly which defeats the point of encapsulation. Something like setLaunched() or markLaunched() which returns false if it was already launched would be easier to read and also hide the fact that there's a CAS operation on an AtomicBoolean underneath. In practice the boolean only goes one direction, so no need to expose it completely. EXIT_CODE_FILE_SUFFIX should be package-private instead of protected. getContainerPid method also only needs to be package-private. Nit: ContainerCleanup should cache the pid file path in a local rather than always calling the accessor method. setPidFilePath added but never called. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643942#comment-16643942 ] Hadoop QA commented on YARN-7644: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 12s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 24s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 5 new + 118 unchanged - 10 fixed = 123 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 18s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 50s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 67m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12943076/YARN-7644.003.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1a9f38d626b9 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c3d22d3 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/22115/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22115/testReport/ | | Max. process+thread count | 415 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643801#comment-16643801 ] Chandni Singh commented on YARN-7644: - Addressed [~jlowe]'s review comments in patch 3. * Made {{sleepDelayBeforeSigKill}} final. * Made {{ContainerCleanup}} not accesses the variables in {{ContainerLaunch}} directly. Added access methods in {{ContainerLaunch}}. My preference is to keep {{ContainerCleanup}} outside {{ContainerLaunch}} because: *# {{ContainerLauncher}} needs to be able to access {{ContainerCleanup}} to create an instance of this task. *# {{ContainerLaunch}} is already quite big (approx. 2000 lines). * Did not change the access modifiers of {{pidFilePath}} and {{containerAlreadyLaunched}} in {{ContainerLaunch}} since the other classes - {{ContainerRelaunch}}, {{RecoveredContainerLaunch}}, {{RecoveredPausedContainerLaunch}} which are extensions of {{ContainerLaunch}} access them directly. * Created https://issues.apache.org/jira/browse/YARN-8861 to change the name of variable {{executorLock}} in {{ContainerLaunch}} > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch, > YARN-7644.003.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16643405#comment-16643405 ] Jason Lowe commented on YARN-7644: -- Ah yes, sorry I was confusing reaping a container with killing it. The signals are not blocked so we're good there. My apologies for misreading it. The lock name does imply the lock is meant to be grabbed when calling the executor, maybe "launchLock" would be more appropriate since it's designed to be held during a container launch? Anyway that change does not need to be part of this JIRA. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642595#comment-16642595 ] Chandni Singh commented on YARN-7644: - [~jlowe] please see my response below {quote}IIUC the launchContainer method for the executor is a synchronous, blocking call that won't return until the container completes. For example, see DefaultContainerExecutor#launchContainer where it invokes Shell.CommandExecutor#execute. That means the executor lock would be held continuously while the container is running. Therefore I'm not sure how the thread running ContainerLaunch#reapContainer is going to obtain the executor lock to be able to proceed to kill the container. Seems like it would just hang, but maybe I'm missing something. This may be more of an issue with YARN-8160 than this one, as it looks like this mostly just refactored existing code to move it into a ContainerCleanup class. {quote} Before {{reapContainer()}}, container term/kill signal is always sent. This is not blocked. With YARN-8160, we wait for {{launchContainer()}} to complete after the signal is sent and then perform the {{reapContainer()}} {quote}To be honest I'm not quite sure what the purpose of the lock is, since there are many places we invoke the executor without the lock like deactivating and signalling. The use of the lock seems inconsistent if it's supposed to guard when we are invoking the executor. {quote} This is the comment that describes the issue which the change in YARN-8160 fixed: https://issues.apache.org/jira/browse/YARN-8160?focusedCommentId=16570774=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16570774 I will summarize it here. * Container is launched * Re-init of container is requested * Re-init triggers container stop and removes the container * Meanwhile the container launch exits with 255 because the container files are cleaned up by reap container. This is because after the executor exits the launch, it performs docker inspect With the executorLock, we are waiting for the executor.launchContainer to complete after term/kill signal is sent to it. Once the launch is completed, we have the correct exit code from the container. Then the reap is performed. Possibly, the name {{executorLock}} is confusing which I can change? I will address your other comments in the next patch. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642576#comment-16642576 ] Jason Lowe commented on YARN-7644: -- Thanks for the patch! I'm a little concerned about the container executor lock that was added in YARN-8160 and used as part of this patch. IIUC the launchContainer method for the executor is a synchronous, blocking call that won't return until the container completes. For example, see DefaultContainerExecutor#launchContainer where it invokes Shell.CommandExecutor#execute. That means the executor lock would be held continuously while the container is running. Therefore I'm not sure how the thread running ContainerLaunch#reapContainer is going to obtain the executor lock to be able to proceed to kill the container. Seems like it would just hang, but maybe I'm missing something. This may be more of an issue with YARN-8160 than this one, as it looks like this mostly just refactored existing code to move it into a ContainerCleanup class. To be honest I'm not quite sure what the purpose of the lock is, since there are many places we invoke the executor without the lock like deactivating and signalling. The use of the lock seems inconsistent if it's supposed to guard when we are invoking the executor. Nit: It feels a little off for ContainerCleanup to reach into fields directly, e.g.: launch.pidFilePath, launch.containerAlreadyLaunched, launch.completed, etc. It made more sense when this code was part of ContainerLaunch because the field was private and no code other than the implementation details of ContainerLaunch needed to know. Now there's another, separate class that is reaching in to grab all these fields. Seems like either ContainerCleanup should be a private static class of ContainerLaunch or there should be accessors so we can keep these fields private. ContainerLaunch.sleepDelayBeforeSigKill should be final like the other properties. The assignment to 250 will always be clobbered by the constructor anyway. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16642119#comment-16642119 ] Chandni Singh commented on YARN-7644: - [~jlowe] do you have any comments? > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640439#comment-16640439 ] Hadoop QA commented on YARN-7644: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 57s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 20s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 4 new + 118 unchanged - 10 fixed = 122 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 42s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 33s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 65m 6s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942305/YARN-7644.002.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1b1a542dc063 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cdf5d58 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/22074/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/22074/testReport/ | | Max. process+thread count | 415 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U:
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639025#comment-16639025 ] Eric Yang commented on YARN-7644: - Trunk build issues seem to have been resolved, triggering pre-commit test for patch 002. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638840#comment-16638840 ] Eric Badger commented on YARN-7644: --- Personally, I think that {{org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher}} is the more appropriate location for the new class because it is related to the container launch cycle of events. Based on the name of the class ({{CleanupContainer}}, it probably should be in the deletion package. But based on the actual implementation of what it actually does, I think it belongs in launcher. I think there are pros and cons to each, and I agree that it gets a little messy since we have to involve a deletion task to actually remove the docker containers, but I think that is the deviation and that we should maintain course in this case. Overall, I think the patch looks good. +1 (non-binding) from me. [~jlowe], do you have any comments? > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638819#comment-16638819 ] Eric Yang commented on YARN-7644: - [~csingh] ContainerCleanup a a runnable, and it is a utility class to remove container. It would be good to keep this helper class general and can be reused in deletion task in my opinion. Launcher package is all dealing with similar kind of events like launch, relaunch, or pause launch. Deletion task has a package prefix of it's own. Package names appear more organized, if ContainerCleanup is in deletion.task package, even if the utility class is called by launch event failure or completion. That is just my opinion. I will let others provide feedback. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637378#comment-16637378 ] Chandni Singh commented on YARN-7644: - Fixed some of the checkstyle warnings in patch 2. The ones for indentation in ContainersLauncher follow the indentation of the existing statements under case {code:java} /hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java:156: ContainerLaunch existingLaunch = running.remove(containerId);: 'block' child has incorrect indentation level 8, expected level should be 6. [Indentation] {code} {quote}I think ContainerCleanup class belong to org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task instead of in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher for reusing existing package structure. {quote} [~eyang] I think the {{org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher}} is better for ContainerCleanup. This package contains all the tasks- launch, relaunch which are created by {{ContainersLauncher}} and are not used outside this package. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch, YARN-7644.002.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637117#comment-16637117 ] Eric Yang commented on YARN-7644: - [~csingh] The mvn install on root seems like a release branch error. We can trigger test again after trunk has been fixed. I think ContainerCleanup class belong to org.apache.hadoop.yarn.server.nodemanager.containermanager.deletion.task instead of in org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher for reusing existing package structure. Some checkstyle can be clean up. [~ebadger] [~jlowe] Do you have any concern in the async task for reapContainer? > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636160#comment-16636160 ] Hadoop QA commented on YARN-7644: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 12s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 1m 7s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 35s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 21s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 8 new + 118 unchanged - 10 fixed = 126 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 0m 11s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 46s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 28m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942175/YARN-7644.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f8bd1de54c7e 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e8b8604 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/22032/artifact/out/branch-mvninstall-root.txt | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/22032/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | checkstyle |
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636142#comment-16636142 ] Eric Badger commented on YARN-7644: --- {quote} mvninstall fails with Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce . {quote} Same for me. I saw something on the mailing list as well. This seems to be a problem that was brought forth when branch 3-2 was cut > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636071#comment-16636071 ] Chandni Singh commented on YARN-7644: - mvninstall fails with {{Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M1:enforce}} . I didn't change any dependencies so I don't think this is related to my change. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636066#comment-16636066 ] Hadoop QA commented on YARN-7644: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 14s{color} | {color:red} root in trunk failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 1m 17s{color} | {color:red} branch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 40s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 1s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 8 new + 118 unchanged - 10 fixed = 126 total (was 128) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedclient {color} | {color:red} 0m 12s{color} | {color:red} patch has errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 18m 38s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 29m 10s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestNMProxy | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:4b8c2b1 | | JIRA Issue | YARN-7644 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12942175/YARN-7644.001.patch | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 310400cd5f18 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / e8b8604 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_181 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/22031/artifact/out/branch-mvninstall-root.txt | | findbugs | v3.1.0-RC1 | | mvninstall | https://builds.apache.org/job/PreCommit-YARN-Build/22031/artifact/out/patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt | | checkstyle |
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636014#comment-16636014 ] Chandni Singh commented on YARN-7644: - I have yet to test the patch 1 with Docker containers. Didn't see any issues with {{DefaultContainerExecutor}}. [~ebadger] [~eyang] please take a look when you get a chance. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-7644.001.patch > > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629097#comment-16629097 ] Eric Badger commented on YARN-7644: --- bq. With this Jira, I can focus on CLEANUP_CONTAINER and CLEANUP_CONTAINER_FOR_REINIT events to be performed in a non-blocking way. That sounds like the correct approach to me > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626455#comment-16626455 ] Chandni Singh commented on YARN-7644: - For {{LAUNCH_CONTAINER}}, {{RELAUNCH_CONTAINER}}, {{RECOVER_CONTAINER}}, and {{RECOVER_PAUSED_CONTAINER}}, the {{ContainersLauncher}} service creates tasks and submits it to the executor to be performed in a non-blocking way: {code:java} containerLauncher.submit(launch); {code} However, for {{CLEANUP_CONTAINER}}, {{CLEANUP_CONTAINER_FOR_REINIT}}, {{SIGNAL_CONTAINER}}, {{PAUSE_CONTAINER}}, {{RESUME_CONTAINER}}, the actions are performed in a blocking way. {code:java} launcher.cleanupContainer(); {code} With this Jira, I can focus on {{CLEANUP_CONTAINER}} and {{CLEANUP_CONTAINER_FOR_REINIT}} events to be performed in a non-blocking way. Doesn't look the caller ({{ContainerImpl}}) waits anywhere for {{cleanupContainer()}} to be performed synchronously. It is triggered by dispatching {{ContainersLauncherEventType.CLEANUP_CONTAINER}} events. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7644) NM gets backed up deleting docker containers
[ https://issues.apache.org/jira/browse/YARN-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588083#comment-16588083 ] Chandni Singh commented on YARN-7644: - [~ebadger] I would like to work on this issue. Please re-assign to me if you are not working on it. > NM gets backed up deleting docker containers > > > Key: YARN-7644 > URL: https://issues.apache.org/jira/browse/YARN-7644 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > > We are sending a {{docker stop}} to the docker container with a timeout of 10 > seconds when we shut down a container. If the container does not stop after > 10 seconds then we force kill it. However, the {{docker stop}} command is a > blocking call. So in cases where lots of containers don't go down with the > initial SIGTERM, we have to wait 10+ seconds for the {{docker stop}} to > return. This ties up the ContainerLaunch handler and so these kill events > back up. It also appears to be backing up new container launches as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org