[jira] [Commented] (YARN-7542) NM recovers some Running Opportunistic Containers as SUSPEND
[ https://issues.apache.org/jira/browse/YARN-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306016#comment-16306016 ] Arun Suresh commented on YARN-7542: --- Thanks - let me just give this a quick manual test and Ill commit it. > NM recovers some Running Opportunistic Containers as SUSPEND > > > Key: YARN-7542 > URL: https://issues.apache.org/jira/browse/YARN-7542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Sampada Dehankar > Attachments: YARN-7542.001.patch > > > Steps to reproduce: > * Start YARN cluster - Enable Opportunistic containers and set NM queue > length to something > 10. Also Enable work preserving restart > * Start an MR job (without opportunistic containers) > * Kill the NM and restart it again. > * In the logs - it shows that some of the containers are in SUSPENDED state - > even though they are still running. > [~sampada15] / [~kartheek], can you take a look at this ? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7542) NM recovers some Running Opportunistic Containers as SUSPEND
[ https://issues.apache.org/jira/browse/YARN-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305999#comment-16305999 ] Sampada Dehankar commented on YARN-7542: Created YARN-7691 to track additional test cases for recovery path. > NM recovers some Running Opportunistic Containers as SUSPEND > > > Key: YARN-7542 > URL: https://issues.apache.org/jira/browse/YARN-7542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Sampada Dehankar > Attachments: YARN-7542.001.patch > > > Steps to reproduce: > * Start YARN cluster - Enable Opportunistic containers and set NM queue > length to something > 10. Also Enable work preserving restart > * Start an MR job (without opportunistic containers) > * Kill the NM and restart it again. > * In the logs - it shows that some of the containers are in SUSPENDED state - > even though they are still running. > [~sampada15] / [~kartheek], can you take a look at this ? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7542) NM recovers some Running Opportunistic Containers as SUSPEND
[ https://issues.apache.org/jira/browse/YARN-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305603#comment-16305603 ] genericqa commented on YARN-7542: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 49s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 41s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 54s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | YARN-7542 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12903922/YARN-7542.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux e6bd6abb8478 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / d31c9d8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/19047/testReport/ | | Max. process+thread count | 408 (vs. ulimit of 5000) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/19047/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT
[jira] [Commented] (YARN-7542) NM recovers some Running Opportunistic Containers as SUSPEND
[ https://issues.apache.org/jira/browse/YARN-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305586#comment-16305586 ] Arun Suresh commented on YARN-7542: --- Thanks for investigating and for the patch [~sampada]. pretty straightforward so +1. Testing this specific case looks to be non-trivial, so let tackle testing {{ContainersLauncher}} properly in a separate JIRA. > NM recovers some Running Opportunistic Containers as SUSPEND > > > Key: YARN-7542 > URL: https://issues.apache.org/jira/browse/YARN-7542 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Sampada Dehankar > Attachments: YARN-7542.001.patch > > > Steps to reproduce: > * Start YARN cluster - Enable Opportunistic containers and set NM queue > length to something > 10. Also Enable work preserving restart > * Start an MR job (without opportunistic containers) > * Kill the NM and restart it again. > * In the logs - it shows that some of the containers are in SUSPENDED state - > even though they are still running. > [~sampada15] / [~kartheek], can you take a look at this ? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org