[jira] [Commented] (YARN-7340) Missing the time stamp in exception message in Class NoOverCommitPolicy
[ https://issues.apache.org/jira/browse/YARN-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486819#comment-16486819 ] genericqa commented on YARN-7340: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 34s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 25s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 68m 15s{color} | {color:green} hadoop-yarn-server-resourcemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}120m 0s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-7340 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924687/YARN-7340.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5efeefc932e0 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5a91406 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/20837/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20837/testReport/ | | Max. process+thread count | 88
[jira] [Resolved] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed
[ https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksandr Shevchenko resolved YARN-7998. Resolution: Fixed > RM crashes with NPE during recovering if ACL configuration was changed > -- > > Key: YARN-7998 > URL: https://issues.apache.org/jira/browse/YARN-7998 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 3.0.0 >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Major > Attachments: YARN-7998.000.patch, YARN-7998.001.patch, > YARN-7998.002.patch, YARN-7998.003.patch > > > RM crashes with NPE during failover because ACL configurations were changed > as a result we no longer have a rights to submit an application to a queue. > Scenario: > # Submit an application > # Change ACL configuration for a queue that accepted the application so that > an owner of the application will no longer have a rights to submit this > application. > # Restart RM. > As a result, we get NPE: > 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed
[ https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486841#comment-16486841 ] Oleksandr Shevchenko commented on YARN-7998: Thank you [~wilfreds]. Closed as the duplicate of YARN-7913. > RM crashes with NPE during recovering if ACL configuration was changed > -- > > Key: YARN-7998 > URL: https://issues.apache.org/jira/browse/YARN-7998 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 3.0.0 >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Major > Attachments: YARN-7998.000.patch, YARN-7998.001.patch, > YARN-7998.002.patch, YARN-7998.003.patch > > > RM crashes with NPE during failover because ACL configurations were changed > as a result we no longer have a rights to submit an application to a queue. > Scenario: > # Submit an application > # Change ACL configuration for a queue that accepted the application so that > an owner of the application will no longer have a rights to submit this > application. > # Restart RM. > As a result, we get NPE: > 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7786) NullPointerException while launching ApplicationMaster
[ https://issues.apache.org/jira/browse/YARN-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7786: Priority: Major (was: Minor) > NullPointerException while launching ApplicationMaster > -- > > Key: YARN-7786 > URL: https://issues.apache.org/jira/browse/YARN-7786 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0, 3.0.0-beta1 >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 2.10.0, 3.2.0, 3.1.1, 2.9.2, 3.0.3 > > Attachments: YARN-7786.patch, YARN-7786_1.patch, YARN-7786_2.patch, > YARN-7786_3.patch, YARN-7786_4.patch, YARN-7786_5.patch, YARN-7786_6.patch, > resourcemanager.log > > > Before launching the ApplicationMaster, send kill command to the job, then > some Null pointer appears: > {code} > 2017-11-25 21:27:25,333 INFO > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Error > launching appattempt_1511616410268_0001_01. Got exception: > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:205) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:304) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6948) Invalid event: ATTEMPT_ADDED at FINAL_SAVING
[ https://issues.apache.org/jira/browse/YARN-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-6948: Priority: Major (was: Minor) > Invalid event: ATTEMPT_ADDED at FINAL_SAVING > > > Key: YARN-6948 > URL: https://issues.apache.org/jira/browse/YARN-6948 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.8.0, 3.0.0-alpha4 >Reporter: lujie >Assignee: lujie >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: YARN-6948_1.patch, YARN-6948_2.patch, yarn-6948.png, > yarn-6948.txt > > > When I send kill command to a running job, I check the logs and find the > Exception: > {code:java} > 2017-08-03 01:35:20,485 ERROR > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: > Can't handle this event at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > ATTEMPT_ADDED at FINAL_SAVING > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:757) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:834) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher.handle(ResourceManager.java:815) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed
[ https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksandr Shevchenko reopened YARN-7998: > RM crashes with NPE during recovering if ACL configuration was changed > -- > > Key: YARN-7998 > URL: https://issues.apache.org/jira/browse/YARN-7998 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 3.0.0 >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Major > Attachments: YARN-7998.000.patch, YARN-7998.001.patch, > YARN-7998.002.patch, YARN-7998.003.patch > > > RM crashes with NPE during failover because ACL configurations were changed > as a result we no longer have a rights to submit an application to a queue. > Scenario: > # Submit an application > # Change ACL configuration for a queue that accepted the application so that > an owner of the application will no longer have a rights to submit this > application. > # Restart RM. > As a result, we get NPE: > 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-7998) RM crashes with NPE during recovering if ACL configuration was changed
[ https://issues.apache.org/jira/browse/YARN-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Oleksandr Shevchenko resolved YARN-7998. Resolution: Duplicate > RM crashes with NPE during recovering if ACL configuration was changed > -- > > Key: YARN-7998 > URL: https://issues.apache.org/jira/browse/YARN-7998 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler, resourcemanager >Affects Versions: 3.0.0 >Reporter: Oleksandr Shevchenko >Assignee: Oleksandr Shevchenko >Priority: Major > Attachments: YARN-7998.000.patch, YARN-7998.001.patch, > YARN-7998.002.patch, YARN-7998.003.patch > > > RM crashes with NPE during failover because ACL configurations were changed > as a result we no longer have a rights to submit an application to a queue. > Scenario: > # Submit an application > # Change ACL configuration for a queue that accepted the application so that > an owner of the application will no longer have a rights to submit this > application. > # Restart RM. > As a result, we get NPE: > 2018-02-27 18:14:00,968 INFO org.apache.hadoop.service.AbstractService: > Service ResourceManager failed in state STARTED; cause: > java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.addApplicationAttempt(FairScheduler.java:738) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1286) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1098) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1044) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7663) RMAppImpl:Invalid event: START at KILLED
[ https://issues.apache.org/jira/browse/YARN-7663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lujie updated YARN-7663: Priority: Major (was: Minor) > RMAppImpl:Invalid event: START at KILLED > > > Key: YARN-7663 > URL: https://issues.apache.org/jira/browse/YARN-7663 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: lujie >Assignee: lujie >Priority: Major > Labels: patch > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4 > > Attachments: YARN-7663_1.patch, YARN-7663_2.patch, YARN-7663_3.patch, > YARN-7663_4.patch, YARN-7663_5.patch, YARN-7663_6.patch, YARN-7663_7.patch > > > Send kill to application, the RM log shows: > {code:java} > org.apache.hadoop.yarn.state.InvalidStateTransitionException: Invalid event: > START at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:805) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:116) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:901) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:885) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110) > at java.lang.Thread.run(Thread.java:745) > {code} > if insert sleep before where the START event was created, this bug will > deterministically reproduce. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8273) Log aggregation does not warn if HDFS quota in target directory is exceeded
[ https://issues.apache.org/jira/browse/YARN-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486937#comment-16486937 ] Gergo Repas commented on YARN-8273: --- [~rkanter] - Thanks for the review and commiting the change! [~snemeth] - Thanks for the review! > Log aggregation does not warn if HDFS quota in target directory is exceeded > --- > > Key: YARN-8273 > URL: https://issues.apache.org/jira/browse/YARN-8273 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 3.1.0 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8273.000.patch, YARN-8273.001.patch, > YARN-8273.002.patch, YARN-8273.003.patch, YARN-8273.004.patch, > YARN-8273.005.patch, YARN-8273.006.patch > > > It appears that if an HDFS space quota is set on a target directory for log > aggregation and the quota is already exceeded when log aggregation is > attempted, zero-byte log files will be written to the HDFS directory, however > NodeManager logs do not reflect a failure to write the files successfully > (i.e. there are no ERROR or WARN messages to this effect). > An improvement may be worth investigating to alert users to this scenario, as > otherwise logs for a YARN application may be missing both on HDFS and locally > (after local log cleanup is done) and the user may not otherwise be informed. > Steps to reproduce: > * Set a small HDFS space quota on /tmp/logs/username/logs (e.g. 2MB) > * Write files to HDFS such that /tmp/logs/username/logs is almost 2MB full > * Run a Spark or MR job in the cluster > * Observe that zero byte files are written to HDFS after job completion > * Observe that YARN container logs are also not present on the NM hosts (or > are deleted after yarn.nodemanager.delete.debug-delay-sec) > * Observe that no ERROR or WARN messages appear to be logged in the NM role > log -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486991#comment-16486991 ] genericqa commented on YARN-4599: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 30m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 19m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 32m 53s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 32s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 28m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 29m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 29m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 29m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 3m 18s{color} | {color:green} root: The patch generated 0 new + 235 unchanged - 1 fixed = 235 total (was 236) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 20m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site . {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}166m 35s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}373m 17s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy | | | hadoop.hdfs.client.impl.TestBlockReaderLocal | | | hadoop.yarn.server.timelineservice.storage.TestHBaseT
[jira] [Created] (YARN-8345) NodeHealthCheckerService to differentiate between reason for UnusableNodes for client to act suitably on it
Kartik Bhatia created YARN-8345: --- Summary: NodeHealthCheckerService to differentiate between reason for UnusableNodes for client to act suitably on it Key: YARN-8345 URL: https://issues.apache.org/jira/browse/YARN-8345 Project: Hadoop YARN Issue Type: New Feature Components: nodemanager Reporter: Kartik Bhatia +*Current Scenario :*+ NodeHealthCheckerService marks a node Unhealthy on basis of 2 things : # External Script # Directory status If a directory is marked as full(as per DiskCheck configs in yarn-site), node manager marks this as unhealthy. Once a node is marked unhealthy, mapreduce launches all the map tasks that ran on this usable node. This leads to even successful tasks being relaunched. +{color:#33}*Problem :*{color}+ {color:#33}We do not have distinction between disk limit to stop container launch on that node and limit so that reducer can read data from that node.{color} {color:#33}For Example : {color} {color:#33}Let us consider a 3 TB disk. If we set max disk utilisation percentage as 95% (since launch of container requires approx 0.15 TB for jobs in our cluster) and there are few nodes where disk utilisation is say 96%, the threshold will be breached. These nodes will be marked unhealthy by NodeManager. This will result in all successful mappers being relaunched on other nodes. But still 4% memory is good enough for reducers to read that data. This causes unnecessary delay in our jobs. (Mappers launching again can preempt reducers if there is crunch for space and there are issues with calculating Headroom in Capacity scheduler as well){color} +*Correction :*+ We need a state (say UNUSABLE_WRITE) that can let mapreduce know that node is still good for reading data and successful mappers should not be relaunched. This can prevent delay. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487066#comment-16487066 ] Jiandan Yang commented on YARN-8320: - [~cheersyang] and I discuss design offline together. Add more details in v2 design doc. > Support CPU isolation for latency-sensitive (LS) service > > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiandan Yang updated YARN-8320: Attachment: CPU-isolation-for-latency-sensitive-services-v2.pdf > Support CPU isolation for latency-sensitive (LS) service > > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8337) Deadlock In Federation Router
[ https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487083#comment-16487083 ] genericqa commented on YARN-8337: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 43s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 29m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 13m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 5m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 19m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 4s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 7m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}100m 55s{color} | {color:red} hadoop-yarn in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s{color} | {color:green} hadoop-yarn-server-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}206m 31s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain | | | hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageSchema | | | hadoop.yarn.server.timelineservice.stor
[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
[ https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487084#comment-16487084 ] Sunil Govindan commented on YARN-8319: -- Updated patch with test case. [~rohithsharma] pls help to check. > More YARN pages need to honor yarn.resourcemanager.display.per-user-apps > > > Key: YARN-8319 > URL: https://issues.apache.org/jira/browse/YARN-8319 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Vinod Kumar Vavilapalli >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8319.001.patch, YARN-8319.002.patch, > YARN-8319.003.patch > > > When this config is on > - Per queue page on UI2 should filter app list by user > -- TODO: Verify the same with UI1 Per-queue page > - ATSv2 with UI2 should filter list of all users' flows and flow activities > - Per Node pages > -- Listing of apps and containers on a per-node basis should filter apps and > containers by user. > To this end, because this is no longer just for resourcemanager, we should > also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of > {{yarn.webapp.filter-app-list-by-user}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
[ https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8319: - Attachment: YARN-8319.003.patch > More YARN pages need to honor yarn.resourcemanager.display.per-user-apps > > > Key: YARN-8319 > URL: https://issues.apache.org/jira/browse/YARN-8319 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Vinod Kumar Vavilapalli >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8319.001.patch, YARN-8319.002.patch, > YARN-8319.003.patch > > > When this config is on > - Per queue page on UI2 should filter app list by user > -- TODO: Verify the same with UI1 Per-queue page > - ATSv2 with UI2 should filter list of all users' flows and flow activities > - Per Node pages > -- Listing of apps and containers on a per-node basis should filter apps and > containers by user. > To this end, because this is no longer just for resourcemanager, we should > also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of > {{yarn.webapp.filter-app-list-by-user}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8337) Deadlock In Federation Router
[ https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487101#comment-16487101 ] Jianchao Jia commented on YARN-8337: [~giovanni.fumarola] Thanks for your reply,I have update the test,please review again. BTW,I don`t think these failed junit tests are related to this patch. > Deadlock In Federation Router > - > > Key: YARN-8337 > URL: https://issues.apache.org/jira/browse/YARN-8337 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router >Reporter: Jianchao Jia >Priority: Major > Attachments: YARN-8337.001.patch, YARN-8337.002.patch > > > We use mysql innodb as the state store engine,in router log we found dead > lock error like below: > {code:java} > [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : > Unable to insert the newly generated application > application_1526295230627_127402 > com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock > found when trying to get lock; try restarting transaction > at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) > at com.mysql.jdbc.Util.getInstance(Util.java:408) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013) > at > com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104) > at > com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418) > at > com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887) > at > com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) > at > com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java) > at > org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547) > {code} > Use "show engine innodb status;" command to find what happens > {code:java} > 2018-05-21 15:41:40 7f4685870700 > *** (1) TRANSACTION: > TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (1) WAITING FOR THIS LOCK TO BE GRANTED: > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 > lock_mode X locks gap before rec insert intention waiting > Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info > bits 0 > 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; > asc application_1526295230627_1274; (total 31 bytes); > 1: len 6; hex 0ba5f32d; asc -;; > 2: len 7; hex dd00280110; asc ( ;; > 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;; > *** (2) TRANSACTION: > TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (2) HOLDS THE LOCK(S): > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore
[jira] [Commented] (YARN-6919) Add default volume mount list
[ https://issues.apache.org/jira/browse/YARN-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487156#comment-16487156 ] Shane Kumpf commented on YARN-6919: --- [~ebadger] - the patch doesn't apply cleanly on 3.1. Do you think we need this in 3.1? If so, could you provide a patch? Thanks! > Add default volume mount list > - > > Key: YARN-6919 > URL: https://issues.apache.org/jira/browse/YARN-6919 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Labels: Docker > Attachments: YARN-6919.001.patch, YARN-6919.002.patch > > > Piggybacking on YARN-5534, we should create a default list that bind mounts > selected volumes into all docker containers. This list will be empty by > default -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
Rohith Sharma K S created YARN-8346: --- Summary: Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full" Key: YARN-8346 URL: https://issues.apache.org/jira/browse/YARN-8346 Project: Hadoop YARN Issue Type: Bug Reporter: Rohith Sharma K S It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the running containers are killed and second attempt is launched for that application. The diagnostics message is "Opportunistic container queue is full" which is the reason for container killed. In NM log, I see below logs for after container is recovered. {noformat} 2018-05-23 17:18:50,655 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: Opportunistic container [container_e06_1527075664705_0001_01_01] will not be queued at the NMsince max queue length [0] has been reached {noformat} Following steps are executed for rolling upgrade # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487175#comment-16487175 ] Rohith Sharma K S commented on YARN-8346: - In class ContainerScheduler#enqueueContainer, for recovered container from 2.8.4 execution type is not set which result in else condition with zero queue lenght. This is sending kill event for container resulting running containers are killed. {code} private boolean enqueueContainer(Container container) { boolean isGuaranteedContainer = container.getContainerTokenIdentifier(). getExecutionType() == ExecutionType.GUARANTEED; boolean isQueued; if (isGuaranteedContainer) { queuedGuaranteedContainers.put(container.getContainerId(), container); isQueued = true; } else { if (queuedOpportunisticContainers.size() < maxOppQueueLength) { LOG.info("Opportunistic container {} will be queued at the NM.", container.getContainerId()); queuedOpportunisticContainers.put( container.getContainerId(), container); isQueued = true; } else { LOG.info("Opportunistic container [{}] will not be queued at the NM" + "since max queue length [{}] has been reached", container.getContainerId(), maxOppQueueLength); container.sendKillEvent( ContainerExitStatus.KILLED_BY_CONTAINER_SCHEDULER, "Opportunistic container queue is full."); isQueued = false; } } {code} Since opportunistic container feature is exist in 2.9, this would also issue upgrading into 2.9 I think. cc:/ [~jlowe] [~arun.sur...@gmail.com] > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Rohith Sharma K S >Priority: Major > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487184#comment-16487184 ] Shane Kumpf commented on YARN-8285: --- Thanks to [~ebadger] for the contribution! I committed this to trunk. > Remove unused environment variables from the Docker runtime > --- > > Key: YARN-8285 > URL: https://issues.apache.org/jira/browse/YARN-8285 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Eric Badger >Priority: Trivial > Labels: Docker > Attachments: YARN-8285.001.patch > > > YARN-7430 enabled user remapping for Docker containers by default. As a > result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer > used and can be removed. > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original > implementation, but was never used and can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8347) [Umbrella] Upgrade efforts to Hadoop 3.x
Sunil Govindan created YARN-8347: Summary: [Umbrella] Upgrade efforts to Hadoop 3.x Key: YARN-8347 URL: https://issues.apache.org/jira/browse/YARN-8347 Project: Hadoop YARN Issue Type: Bug Reporter: Sunil Govindan This is an umbrella ticket to manage all similar efforts to close gaps for upgrade efforts to 3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8346: - Issue Type: Sub-task (was: Bug) Parent: YARN-8347 > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Rohith Sharma K S >Priority: Major > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8347) [Umbrella] Upgrade efforts to Hadoop 3.x
[ https://issues.apache.org/jira/browse/YARN-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487202#comment-16487202 ] Sunil Govindan commented on YARN-8347: -- cc /[~leftnoteasy] > [Umbrella] Upgrade efforts to Hadoop 3.x > > > Key: YARN-8347 > URL: https://issues.apache.org/jira/browse/YARN-8347 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Sunil Govindan >Priority: Major > > This is an umbrella ticket to manage all similar efforts to close gaps for > upgrade efforts to 3.x. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487214#comment-16487214 ] Hudson commented on YARN-8285: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14264 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14264/]) YARN-8285. Remove unused environment variables from the Docker runtime. (skumpf: rev 9837ca9cc746573571029f9fb996a1be10b588ab) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/runtime/DockerLinuxContainerRuntime.java > Remove unused environment variables from the Docker runtime > --- > > Key: YARN-8285 > URL: https://issues.apache.org/jira/browse/YARN-8285 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Eric Badger >Priority: Trivial > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-8285.001.patch > > > YARN-7430 enabled user remapping for Docker containers by default. As a > result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer > used and can be removed. > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original > implementation, but was never used and can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8191) Fair scheduler: queue deletion without RM restart
[ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gergo Repas updated YARN-8191: -- Attachment: YARN-8191.015.patch > Fair scheduler: queue deletion without RM restart > - > > Key: YARN-8191 > URL: https://issues.apache.org/jira/browse/YARN-8191 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: Queue Deletion in Fair Scheduler.pdf, > YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, > YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, > YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, > YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, > YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch, > YARN-8191.015.patch > > > The Fair Scheduler never cleans up queues even if they are deleted in the > allocation file, or were dynamically created and are never going to be used > again. Queues always remain in memory which leads to two following issues. > # Steady fairshares aren’t calculated correctly due to remaining queues > # WebUI shows deleted queues, which is confusing for users (YARN-4022). > We want to support proper queue deletion without restarting the Resource > Manager: > # Static queues without any entries that are removed from fair-scheduler.xml > should be deleted from memory. > # Dynamic queues without any entries should be deleted. > # RM Web UI should only show the queues defined in the scheduler at that > point in time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8297) Incorrect ATS Url used for Wire encrypted cluster
[ https://issues.apache.org/jira/browse/YARN-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487248#comment-16487248 ] Hudson commented on YARN-8297: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14265 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14265/]) YARN-8297. Incorrect ATS Url used for Wire encrypted cluster.(addendum). (rohithsharmaks: rev f61e3e752eb1cf4a08030da04bc3d6c5a2b3926d) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/app/initializers/loader.js > Incorrect ATS Url used for Wire encrypted cluster > - > > Key: YARN-8297 > URL: https://issues.apache.org/jira/browse/YARN-8297 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn-ui-v2 >Affects Versions: 3.1.0 >Reporter: Yesha Vora >Assignee: Sunil Govindan >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8297-addendum.patch, YARN-8297.001.patch > > > "Service" page uses incorrect web url for ATS in wire encrypted env. For ATS > urls, it uses https protocol with http port. > This issue causes all ATS call to fail and UI does not display component > details. > url used: > https://xxx:8198/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320 > expected url : > https://xxx:8199/ws/v2/timeline/apps/application_1526357251888_0022/entities/SERVICE_ATTEMPT?fields=ALL&_=1526415938320 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8346: - Priority: Blocker (was: Major) > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487249#comment-16487249 ] Sunil Govindan commented on YARN-8346: -- bumping up as Blocker. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8346: - Affects Version/s: 3.1.0 3.0.2 > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8346: - Target Version/s: 3.1.1, 3.0.3 > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8285) Remove unused environment variables from the Docker runtime
[ https://issues.apache.org/jira/browse/YARN-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487323#comment-16487323 ] Eric Badger commented on YARN-8285: --- Thanks, [~shaneku...@gmail.com]! > Remove unused environment variables from the Docker runtime > --- > > Key: YARN-8285 > URL: https://issues.apache.org/jira/browse/YARN-8285 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Shane Kumpf >Assignee: Eric Badger >Priority: Trivial > Labels: Docker > Fix For: 3.2.0 > > Attachments: YARN-8285.001.patch > > > YARN-7430 enabled user remapping for Docker containers by default. As a > result, YARN_CONTAINER_RUNTIME_DOCKER_RUN_ENABLE_USER_REMAPPING is no longer > used and can be removed. > YARN_CONTAINER_RUNTIME_DOCKER_IMAGE_FILE was added in the original > implementation, but was never used and can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
[ https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487353#comment-16487353 ] genericqa commented on YARN-8319: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 31s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 23s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 52s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 27s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 1m 25s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 302 unchanged - 0 fixed = 303 total (was 302) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 50s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 46s{color} | {color:green} hadoop-yarn-api in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 15s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 19m 20s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 11s{color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 7s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 48s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}200m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.nodemanager.containermanager.TestContainerManager | | | hadoop.yarn.server.reso
[jira] [Commented] (YARN-8319) More YARN pages need to honor yarn.resourcemanager.display.per-user-apps
[ https://issues.apache.org/jira/browse/YARN-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487446#comment-16487446 ] Sunil Govindan commented on YARN-8319: -- Test failures are not related. [~rohithsharma] could u pls check > More YARN pages need to honor yarn.resourcemanager.display.per-user-apps > > > Key: YARN-8319 > URL: https://issues.apache.org/jira/browse/YARN-8319 > Project: Hadoop YARN > Issue Type: Bug > Components: webapp >Reporter: Vinod Kumar Vavilapalli >Assignee: Sunil Govindan >Priority: Major > Attachments: YARN-8319.001.patch, YARN-8319.002.patch, > YARN-8319.003.patch > > > When this config is on > - Per queue page on UI2 should filter app list by user > -- TODO: Verify the same with UI1 Per-queue page > - ATSv2 with UI2 should filter list of all users' flows and flow activities > - Per Node pages > -- Listing of apps and containers on a per-node basis should filter apps and > containers by user. > To this end, because this is no longer just for resourcemanager, we should > also deprecate {{yarn.resourcemanager.display.per-user-apps}} in favor of > {{yarn.webapp.filter-app-list-by-user}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
[ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487490#comment-16487490 ] genericqa commented on YARN-8191: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 37s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 33s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s{color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 48s{color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}127m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager | | | org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.removeEmptyIncompatibleQueues(String, FSQueueType) has Boolean return type and returns explicit null At QueueManager.java:type and returns explicit null At QueueManager.java:[line 401] | | Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8191 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924749/YARN-8191.015.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9292b981d335 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / f61e3e7 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487535#comment-16487535 ] Eric Yang commented on YARN-8346: - The queue length is based on yarn.nodemanager.opportunistic-containers-max-queue-length setting. If yarn-site.xml does not specify this, it will have size of 0. This seems like a bad default value that will cause rolling upgrade to fail. I think a default value of 4 to 8 is probably sensible. Number of CPU cores and queue length are some what related, it would be good for external upgrading system like Ambari to set queue length accordingly. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487557#comment-16487557 ] Jason Lowe commented on YARN-8346: -- IIUC the issue isn't the queue length setting but rather that the containers are recovered with the wrong execution type (opportunistic instead of guaranteed). I believe the bug is here in ContainerTokenIdentifier#getExecutionType: {code} public ExecutionType getExecutionType(){ if (!proto.hasExecutionType()) { return null; } return convertFromProtoFormat(proto.getExecutionType()); } {code} Instead of returning NULL for the execution type it should return GUARANTEED. All containers before an execution type was added were effectively guaranteed since that was the only execution type supported. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
[ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487573#comment-16487573 ] Gergo Repas commented on YARN-8191: --- Thanks [~haibochen] for the review! {quote} In this patch, getRemovedStaticQueues() and QueueManager.setQueuesToDynamic() together identify the queues that were added in the previous allocation file, but are now removed in the new allocation file, and then mark them as dynamic. What I mean is that, is it possible that some queues that were created dynamically, but are now included in the new allocation file? If so, we need to mark them as static. {quote} This is done in {{QueueManager.ensureQueueExistsAndIsCompatibleAndIsStatic()}} method, it's been taken care of by the {{queue.setDynamic(false);}} line. {quote} The behavior of QueueManager.removeLeafQueue() is still changed with the refactoring. Previously it would return true if there is no incompatible queue found, but it now returns false. We should also return true if removeEmptyIncompatibleQueues(name, FSQueueType.PARENT) returns null. Similarly, in IncompatibleQueueRemovalTask.execute(), the task shall be removed if `removed == null`. {quote} Thanks, I've corrected these conditions. {quote} Let's add some javadoc to newly added QueueManager public methods. {quote} Sure, I added them. {quote} `reloadListener.onCheck();` makes me worried what if the listener is not set. Looking closely at the code, the setReloadListener() is always set right after the AllocationFileLoaderService constructor, so I think we can move reloadListener as a construnctor argument, so that we never worry if listener is null.{quote} The Listener is now a constructor argument in the latest patch. > Fair scheduler: queue deletion without RM restart > - > > Key: YARN-8191 > URL: https://issues.apache.org/jira/browse/YARN-8191 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: Queue Deletion in Fair Scheduler.pdf, > YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, > YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, > YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, > YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, > YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch, > YARN-8191.015.patch > > > The Fair Scheduler never cleans up queues even if they are deleted in the > allocation file, or were dynamically created and are never going to be used > again. Queues always remain in memory which leads to two following issues. > # Steady fairshares aren’t calculated correctly due to remaining queues > # WebUI shows deleted queues, which is confusing for users (YARN-4022). > We want to support proper queue deletion without restarting the Resource > Manager: > # Static queues without any entries that are removed from fair-scheduler.xml > should be deleted from memory. > # Dynamic queues without any entries should be deleted. > # RM Web UI should only show the queues defined in the scheduler at that > point in time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487593#comment-16487593 ] Eric Yang commented on YARN-8346: - [~jlowe] Yes, you are right. Existing workload supposed to have execution type GUARANTEED. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8320) Support CPU isolation for latency-sensitive (LS) service
[ https://issues.apache.org/jira/browse/YARN-8320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487596#comment-16487596 ] Wangda Tan commented on YARN-8320: -- Thanks [~cheersyang] / [~yangjiandan] for the detailed design, very helpful to understand the contexts. I took a quick look at the proposal, a couple of questions / comments: 1) It seems that the #vcore must be divisible by #physical-core, otherwise it will cause rounding issue and containers will get less/more than requested resources. If admin enable the feature, YARN should take care of checking this value before starting NM. 2) I'm still trying to understand benefit of RESERVED / SHARED mode. If a RESERVED core can be used by ANY container, in my mind, the RESERVED container can be affected by adhoc ANY container. And similarly, if we allow SHARED containers bind to same set of cores, considering SHARED containers are running LS services and CPU-intensive, they could compute a lot on these SHARED containers. Which could lead to even worse latency and more competitions. 3) Relationship to other features: - Related to NUMA allocation on YARN (YARN-5764), to me the two features are related to each other: Allocate reserved cores to a same process on the same or closest NUMA zone(s) has the best performance, but satisfy one condition can break the other one. Should be very careful to make sure the two features can work together. - Related to GPU allocation on YARN: On one machine, GPU performance is sensitive to topology of GPUs. Communication latency and bandwidth differs a lot when GPUs are connected by NVLink, PCI-E, etc. It might be valuable to think about is it possible to have a same framework on the same NM to do resource-specific scheduling and placement. - Related to ResourcePlugin framework: We added ResourcePlugin framework since YARN-7224, and now GPU/FPGA are using the framework to implement the feature. I'm not sure if this feature can benefit from the ResourcePlugin framework, or some refactoring required to the framework. It's better if we can extract common part and workflow out. 4) To me only privileged users and applications can request non-ANY CPU mode, how can we enforce this (maybe not in phase#1, but we need a plan here). > Support CPU isolation for latency-sensitive (LS) service > > > Key: YARN-8320 > URL: https://issues.apache.org/jira/browse/YARN-8320 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager >Reporter: Jiandan Yang >Priority: Major > Attachments: CPU-isolation-for-latency-sensitive-services-v1.pdf, > CPU-isolation-for-latency-sensitive-services-v2.pdf, YARN-8320.001.patch > > > Currently NodeManager uses “cpu.cfs_period_us”, “cpu.cfs_quota_us” and > “cpu.shares” to isolate cpu resource. However, > * Linux Completely Fair Scheduling (CFS) is a throughput-oriented scheduler; > no support for differentiated latency > * Request latency of services running on container may be frequent shake > when all containers share cpus, and latency-sensitive services can not afford > in our production environment. > So we need more fine-grained cpu isolation. > Here we propose a solution using cgroup cpuset to binds containers to > different processors, this is inspired by the isolation technique in [Borg > system|http://schd.ws/hosted_files/lcccna2016/a7/CAT%20@%20Scale.pdf]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487606#comment-16487606 ] Íñigo Goiri commented on YARN-8334: --- The approach in [^YARN-8334-YARN-7402.v2.patch] makes sense to me. I'm not sure if there is a way to trigger a warning when this is not closed. [~giovanni.fumarola], is there any related unit test that goes through this code? > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487610#comment-16487610 ] Íñigo Goiri commented on YARN-8336: --- Same fix as YARN-8334; LGTM. Is there any unit test covering this code path? The only concern is the inconsistency to check the OK status between these two pieces of code. We may want to open a JIRA to make the WS management consistent in the YARN Router. > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8191) Fair scheduler: queue deletion without RM restart
[ https://issues.apache.org/jira/browse/YARN-8191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487621#comment-16487621 ] Haibo Chen commented on YARN-8191: -- +1 pending the findbug fix. The TestAMRestart.testPreemptedAMRestartOnRMRestart failed in the last two runs, can you take a look [~grepas]? I think we can fix the findbug issue by returning Optional instead. > Fair scheduler: queue deletion without RM restart > - > > Key: YARN-8191 > URL: https://issues.apache.org/jira/browse/YARN-8191 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 3.0.1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: Queue Deletion in Fair Scheduler.pdf, > YARN-8191.000.patch, YARN-8191.001.patch, YARN-8191.002.patch, > YARN-8191.003.patch, YARN-8191.004.patch, YARN-8191.005.patch, > YARN-8191.006.patch, YARN-8191.007.patch, YARN-8191.008.patch, > YARN-8191.009.patch, YARN-8191.010.patch, YARN-8191.011.patch, > YARN-8191.012.patch, YARN-8191.013.patch, YARN-8191.014.patch, > YARN-8191.015.patch > > > The Fair Scheduler never cleans up queues even if they are deleted in the > allocation file, or were dynamically created and are never going to be used > again. Queues always remain in memory which leads to two following issues. > # Steady fairshares aren’t calculated correctly due to remaining queues > # WebUI shows deleted queues, which is confusing for users (YARN-4022). > We want to support proper queue deletion without restarting the Resource > Manager: > # Static queues without any entries that are removed from fair-scheduler.xml > should be deleted from memory. > # Dynamic queues without any entries should be deleted. > # RM Web UI should only show the queues defined in the scheduler at that > point in time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8341) Yarn Service: Integration tests
[ https://issues.apache.org/jira/browse/YARN-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487636#comment-16487636 ] Eric Yang commented on YARN-8341: - [~csingh] Thanks for the patch. I think it will be good to create a separate profile, and the profile can be activated by properties. This will be easier to show the required properties to run integration test. The separated profile will compile integration test, and run the test cases against the target cluster. For example: {code} rm.host ... integration-test ... {code} This enables Jenkins job to run the integration test with minimum effort on CLI settings: {code} mvn integration-test -Drm.host=localhost {code} User.name property is already predefined by JVM. It would be good to use current user identity to launch to support both kerberos and non-kerberos tests. > Yarn Service: Integration tests > > > Key: YARN-8341 > URL: https://issues.apache.org/jira/browse/YARN-8341 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Attachments: YARN-8341.wip.patch > > > In order to test the rest api end-to-end, we can add Integration tests for > Yarn service api. > The integration tests > * belong to junit category {{IntegrationTest}}. > * will be only run when triggered by executing {{mvn > failsafe:integration-test}} > * the surefire plugin for regular tests excludes {{IntegrationTest}} > * RM host, user name, and any additional properties which are needed to > execute the tests against a cluster can be passed as System properties. > For eg. {{mvn failsafe:integration-test -Drm.host=localhost -Duser.name=root}} > We can add more integration tests which can check scalability and performance. > Have these tests here benefits everyone in the community because anyone can > run these tests against there cluster. > Attaching a work in progress patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8337) [FederationStateStore - MySql] Deadlock In addApplicationHome
[ https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8337: --- Summary: [FederationStateStore - MySql] Deadlock In addApplicationHome (was: Deadlock In Federation Router) > [FederationStateStore - MySql] Deadlock In addApplicationHome > - > > Key: YARN-8337 > URL: https://issues.apache.org/jira/browse/YARN-8337 > Project: Hadoop YARN > Issue Type: Bug > Components: federation, router >Reporter: Jianchao Jia >Priority: Major > Attachments: YARN-8337.001.patch, YARN-8337.002.patch > > > We use mysql innodb as the state store engine,in router log we found dead > lock error like below: > {code:java} > [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : > Unable to insert the newly generated application > application_1526295230627_127402 > com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock > found when trying to get lock; try restarting transaction > at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) > at com.mysql.jdbc.Util.getInstance(Util.java:408) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013) > at > com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104) > at > com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418) > at > com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887) > at > com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) > at > com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java) > at > org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547) > {code} > Use "show engine innodb status;" command to find what happens > {code:java} > 2018-05-21 15:41:40 7f4685870700 > *** (1) TRANSACTION: > TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (1) WAITING FOR THIS LOCK TO BE GRANTED: > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 > lock_mode X locks gap before rec insert intention waiting > Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info > bits 0 > 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; > asc application_1526295230627_1274; (total 31 bytes); > 1: len 6; hex 0ba5f32d; asc -;; > 2: len 7; hex dd00280110; asc ( ;; > 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;; > *** (2) TRANSACTION: > TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (2) HOLDS THE LOCK(S): > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applications
[jira] [Updated] (YARN-8337) [FederationStateStore - MySql] Deadlock In addApplicationHome
[ https://issues.apache.org/jira/browse/YARN-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8337: --- Issue Type: Sub-task (was: Bug) Parent: YARN-7402 > [FederationStateStore - MySql] Deadlock In addApplicationHome > - > > Key: YARN-8337 > URL: https://issues.apache.org/jira/browse/YARN-8337 > Project: Hadoop YARN > Issue Type: Sub-task > Components: federation, router >Reporter: Jianchao Jia >Priority: Major > Attachments: YARN-8337.001.patch, YARN-8337.002.patch > > > We use mysql innodb as the state store engine,in router log we found dead > lock error like below: > {code:java} > [2018-05-21T15:41:40.383+08:00] [ERROR] [IPC Server handler 25 on 8050] : > Unable to insert the newly generated application > application_1526295230627_127402 > com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock > found when trying to get lock; try restarting transaction > at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown > Source) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at com.mysql.jdbc.Util.handleNewInstance(Util.java:425) > at com.mysql.jdbc.Util.getInstance(Util.java:408) > at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:952) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973) > at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909) > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2484) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2079) > at > com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2013) > at > com.mysql.jdbc.PreparedStatement.executeLargeUpdate(PreparedStatement.java:5104) > at > com.mysql.jdbc.CallableStatement.executeLargeUpdate(CallableStatement.java:2418) > at > com.mysql.jdbc.CallableStatement.executeUpdate(CallableStatement.java:887) > at > com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61) > at > com.zaxxer.hikari.pool.HikariProxyCallableStatement.executeUpdate(HikariProxyCallableStatement.java) > at > org.apache.hadoop.yarn.server.federation.store.impl.SQLFederationStateStore.addApplicationHomeSubCluster(SQLFederationStateStore.java:547) > {code} > Use "show engine innodb status;" command to find what happens > {code:java} > 2018-05-21 15:41:40 7f4685870700 > *** (1) TRANSACTION: > TRANSACTION 241131538, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7602335, OS thread handle 0x7f46858f2700, query id 2919792534 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (1) WAITING FOR THIS LOCK TO BE GRANTED: > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131538 > lock_mode X locks gap before rec insert intention waiting > Record lock, heap no 23 PHYSICAL RECORD: n_fields 4; compact format; info > bits 0 > 0: len 30; hex 6170706c69636174696f6e5f313532363239353233303632375f31323734; > asc application_1526295230627_1274; (total 31 bytes); > 1: len 6; hex 0ba5f32d; asc -;; > 2: len 7; hex dd00280110; asc ( ;; > 3: len 13; hex 686f70655f636c757374657231; asc hope_cluster1;; > *** (2) TRANSACTION: > TRANSACTION 241131539, ACTIVE 0 sec inserting, thread declared inside InnoDB > 4999 > mysql tables in use 2, locked 2 > 4 lock struct(s), heap size 1184, 2 row lock(s) > MySQL thread id 7600638, OS thread handle 0x7f4685870700, query id 2919792535 > 192.168.1.138 federation executing > INSERT INTO applicationsHomeSubCluster > (applicationId,homeSubCluster) > (SELECT applicationId_IN, homeSubCluster_IN > FROM applicationsHomeSubCluster > WHERE applicationId = applicationId_IN > HAVING COUNT(*) = 0 ) > *** (2) HOLDS THE LOCK(S): > RECORD LOCKS space id 113 page no 21208 n bits 296 index `PRIMARY` of table > `guldan_federationstatestore`.`applicationshomesubcluster` trx id 241131539 > lock mode
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487699#comment-16487699 ] Yongjun Zhang commented on YARN-8346: - Hi Guys, Thanks for reporting and working on the issue. I'm preparing 3.0.3 release, wonder if we can prioritize this one if we deem its a blocker? Thanks, > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406
[ https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487710#comment-16487710 ] Haibo Chen commented on YARN-8338: -- [~jlowe] [~vinodkv], I did not understand who else was depending on objenesis 2.1, so wanted to avoid the accidental upgrade of objenesis. The upgrade of de.ruedigermoeller:fst would have upgraded objenesis from 2.1 to 2.5.1 at the time, without the exclusion of objenesis from fst. Based on [this fst comment|https://github.com/RuedigerMoeller/fast-serialization/blob/aceaad0075b2e1ef796597a1098aeb39fbea7fc9/pom.xml#L141], I assumed it was safe to do so, and there was not issue found in my test (as Jason noted, a dependency of mockito-all previously brought in objensis transitively). [This comment|https://github.com/RuedigerMoeller/fast-serialization/blob/aceaad0075b2e1ef796597a1098aeb39fbea7fc9/pom.xml#L141] says that if android is not used, we don't need objenesis. It'd be nice if we can remove such runtime dependency from application history service, and let hadoop-aws to choose whichever verion of objenesis it needs. > TimelineService V1.5 doesn't come up after HADOOP-15406 > --- > > Key: YARN-8338 > URL: https://issues.apache.org/jira/browse/YARN-8338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-8338.txt > > > TimelineService V1.5 fails with the following: > {code} > java.lang.NoClassDefFoundError: org/objenesis/Objenesis > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487716#comment-16487716 ] Giovanni Matteo Fumarola commented on YARN-8334: Thanks [~elgoiri] and [~botong] for the review. I am not sure if there is a way to trigger a warning when it is not closed correctly. *TestPolicyGenerator* goes through the code. > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487723#comment-16487723 ] Giovanni Matteo Fumarola commented on YARN-8336: Thanks [~elgoiri] for the review. *TestSchedConfCL* and *TestLogsCLI* go through the codes. I will open a Jira to make WS management consistent in the entire codebase. > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8344: --- Summary: Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows (was: Missing nm.close() in TestNodeManagerResync to fix unit tests on Windows) > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows > > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8338) TimelineService V1.5 doesn't come up after HADOOP-15406
[ https://issues.apache.org/jira/browse/YARN-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487738#comment-16487738 ] Jason Lowe commented on YARN-8338: -- bq. This comment says that if android is not used, we don't need objenesis. I think this JIRA proves that the comment is wrong. There's no direct usage of objenesis in the Hadoop code base, but when we try to load fst classes in a static code block it fails trying to lookup objenesis classes. That looks like objenesis classes are required for FST to work. Maybe the methods of those classes aren't called when not running on android, but the classes need to be there. > TimelineService V1.5 doesn't come up after HADOOP-15406 > --- > > Key: YARN-8338 > URL: https://issues.apache.org/jira/browse/YARN-8338 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Critical > Attachments: YARN-8338.txt > > > TimelineService V1.5 fails with the following: > {code} > java.lang.NoClassDefFoundError: org/objenesis/Objenesis > at > org.apache.hadoop.yarn.server.timeline.RollingLevelDBTimelineStore.(RollingLevelDBTimelineStore.java:174) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487741#comment-16487741 ] Jason Lowe commented on YARN-8346: -- I should have a patch up later today. I already verified the change I proposed above fixes the case that Rohith reported when I test it manually, just need to add a unit test. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487742#comment-16487742 ] Miklos Szegedi commented on YARN-4599: -- The unit test issues are not related to the patch. > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, > YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, > YARN-4599.016.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-4599) Set OOM control for memory cgroups
[ https://issues.apache.org/jira/browse/YARN-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487749#comment-16487749 ] Haibo Chen commented on YARN-4599: -- +1 on the latest patch. Will check it in later today if no objections > Set OOM control for memory cgroups > -- > > Key: YARN-4599 > URL: https://issues.apache.org/jira/browse/YARN-4599 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Affects Versions: 2.9.0 >Reporter: Karthik Kambatla >Assignee: Miklos Szegedi >Priority: Major > Labels: oct16-medium > Attachments: Elastic Memory Control in YARN.pdf, YARN-4599.000.patch, > YARN-4599.001.patch, YARN-4599.002.patch, YARN-4599.003.patch, > YARN-4599.004.patch, YARN-4599.005.patch, YARN-4599.006.patch, > YARN-4599.007.patch, YARN-4599.008.patch, YARN-4599.009.patch, > YARN-4599.010.patch, YARN-4599.011.patch, YARN-4599.012.patch, > YARN-4599.013.patch, YARN-4599.014.patch, YARN-4599.015.patch, > YARN-4599.016.patch, YARN-4599.sandflee.patch, yarn-4599-not-so-useful.patch > > > YARN-1856 adds memory cgroups enforcing support. We should also explicitly > set OOM control so that containers are not killed as soon as they go over > their usage. Today, one could set the swappiness to control this, but > clusters with swap turned off exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned YARN-8346: Assignee: Jason Lowe > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Assignee: Jason Lowe >Priority: Blocker > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri updated YARN-8344: -- Description: Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows. > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows > > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487762#comment-16487762 ] Íñigo Goiri commented on YARN-8344: --- Why does this fail on Windows specifically? > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows > > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8344: --- Attachment: YARN-8344.v2.patch > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows > > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487766#comment-16487766 ] Íñigo Goiri commented on YARN-8334: --- The TestPolicyGenerator unit test runs [here|https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/org.apache.hadoop.yarn.server.globalpolicygenerator.policygenerator/TestPolicyGenerator/]. +1 Committing. > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8334) Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487766#comment-16487766 ] Íñigo Goiri edited comment on YARN-8334 at 5/23/18 6:06 PM: The TestPolicyGenerator unit test runs [here|https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/org.apache.hadoop.yarn.server.globalpolicygenerator.policygenerator/TestPolicyGenerator/]. +1 Feel free to commit to the branch. was (Author: elgoiri): The TestPolicyGenerator unit test runs [here|https://builds.apache.org/job/PreCommit-YARN-Build/20833/testReport/org.apache.hadoop.yarn.server.globalpolicygenerator.policygenerator/TestPolicyGenerator/]. +1 Committing. > Fix potential connection leak in GPGUtils > - > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487775#comment-16487775 ] Íñigo Goiri commented on YARN-8336: --- Both [TestLogsCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestLogsCLI/] and [TestSchedConfCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestSchedConfCLI/] pass. +1 Feel free to commit. > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487778#comment-16487778 ] Giovanni Matteo Fumarola commented on YARN-8344: Attached v2 with the fix for Check style warning. If any test in this class fails all the other tests will fail (same behavior in Windows or Linux). testContainerResourceIncreaseIsSynchronizedWithRMResync fails in Windows - still figuring out the root cause. This patch will fix testKillContainersOnResync > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows > > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8346: - Attachment: YARN-8346.001.patch > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-8346.001.patch > > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8344: --- Summary: Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync (was: Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync on Windows) > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > - > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487778#comment-16487778 ] Giovanni Matteo Fumarola edited comment on YARN-8344 at 5/23/18 6:22 PM: - Attached v2 with the fix for Check style warning. If any test in this class fails all the other tests will fail (same behavior in Windows or Linux). testContainerResourceIncreaseIsSynchronizedWithRMResync fails in Windows - due the length of log directory. This patch will fix testKillContainersOnResync java.io.IOException: Cannot launch container using script at path F:/short/hadoop-trunk-win/s/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync/nm0/usercache/nobody/appcache/application_0_/container_0__01_00/default_container_executor.cmd, because it exceeds the maximum supported path length of 260 characters. Consider configuring shorter directories in yarn.nodemanager.local-dirs. I saw a bunch of tests failing in windows for this reason. I will open a Jira to track this fix. was (Author: giovanni.fumarola): Attached v2 with the fix for Check style warning. If any test in this class fails all the other tests will fail (same behavior in Windows or Linux). testContainerResourceIncreaseIsSynchronizedWithRMResync fails in Windows - still figuring out the root cause. This patch will fix testKillContainersOnResync > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > - > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487803#comment-16487803 ] Konstantinos Karanasos commented on YARN-8346: -- Thanks for the patch, [~jlowe]. Indeed you are right – the problem is the lack of execution type. The queue size should remain 0 given that opportunistic containers are disabled in this case. +1 for the patch. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-8346.001.patch > > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487806#comment-16487806 ] Yongjun Zhang commented on YARN-8108: - Hi [~eyang], It seems the issue also exists in 3.0.2 release. The above discussion indicates that it might take some time for the solution to converge, should we move 3.0.3 out of the target release and list this jira as a known issue for 3.0.3? or we should fix this issue in 3.0.3? Thanks. > RM metrics rest API throws GSSException in kerberized environment > - > > Key: YARN-8108 > URL: https://issues.apache.org/jira/browse/YARN-8108 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kshitij Badani >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-8108.001.patch > > > Test is trying to pull up metrics data from SHS after kiniting as 'test_user' > It is throwing GSSException as follows > {code:java} > b2b460b80713|RUNNING: curl --silent -k -X GET -D > /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : > http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15 > 07:15:48,757|INFO|MainThread|machine.py:194 - > run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0 > 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - > getMetricsJsonData()|metrics: > > > > Error 403 GSSException: Failure unspecified at GSS-API level > (Mechanism level: Request is a replay (34)) > > HTTP ERROR 403 > Problem accessing /proxy/application_1518674952153_0070/metrics/json. > Reason: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Request is a replay (34)) > > > {code} > Rootcausing : proxyserver on RM can't be supported for Kerberos enabled > cluster because AuthenticationFilter is applied twice in Hadoop code (once in > httpServer2 for RM, and another instance from AmFilterInitializer for proxy > server). This will require code changes to hadoop-yarn-server-web-proxy > project -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487810#comment-16487810 ] Yongjun Zhang commented on YARN-8346: - Thanks a lot for the quick turnaround [~jlowe] and [~kkaranasos]. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > > > Key: YARN-8346 > URL: https://issues.apache.org/jira/browse/YARN-8346 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: 3.1.0, 3.0.2 >Reporter: Rohith Sharma K S >Assignee: Jason Lowe >Priority: Blocker > Attachments: YARN-8346.001.patch > > > It is observed while rolling upgrade from 2.8.4 to 3.1 release, all the > running containers are killed and second attempt is launched for that > application. The diagnostics message is "Opportunistic container queue is > full" which is the reason for container killed. > In NM log, I see below logs for after container is recovered. > {noformat} > 2018-05-23 17:18:50,655 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.scheduler.ContainerScheduler: > Opportunistic container [container_e06_1527075664705_0001_01_01] will > not be queued at the NMsince max queue length [0] has been reached > {noformat} > Following steps are executed for rolling upgrade > # Install 2.8.4 cluster and launch a MR job with distributed cache enabled. > # Stop 2.8.4 RM. Start 3.1.0 RM with same configuration. > # Stop 2.8.4 NM batch by batch. Start 3.1.0 NM batch by batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment
[ https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487825#comment-16487825 ] Eric Yang commented on YARN-8108: - [~yzhangal] My preference is to fix this in 3.0.3 release. If consensus is not reached, release manager can push this out of 3.0.3 release, and release note this as an known issue. I am fine with the plan. > RM metrics rest API throws GSSException in kerberized environment > - > > Key: YARN-8108 > URL: https://issues.apache.org/jira/browse/YARN-8108 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kshitij Badani >Assignee: Eric Yang >Priority: Blocker > Attachments: YARN-8108.001.patch > > > Test is trying to pull up metrics data from SHS after kiniting as 'test_user' > It is throwing GSSException as follows > {code:java} > b2b460b80713|RUNNING: curl --silent -k -X GET -D > /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : > http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15 > 07:15:48,757|INFO|MainThread|machine.py:194 - > run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0 > 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - > getMetricsJsonData()|metrics: > > > > Error 403 GSSException: Failure unspecified at GSS-API level > (Mechanism level: Request is a replay (34)) > > HTTP ERROR 403 > Problem accessing /proxy/application_1518674952153_0070/metrics/json. > Reason: > GSSException: Failure unspecified at GSS-API level (Mechanism level: > Request is a replay (34)) > > > {code} > Rootcausing : proxyserver on RM can't be supported for Kerberos enabled > cluster because AuthenticationFilter is applied twice in Hadoop code (once in > httpServer2 for RM, and another instance from AmFilterInitializer for proxy > server). This will require code changes to hadoop-yarn-server-web-proxy > project -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
Giovanni Matteo Fumarola created YARN-8348: -- Summary: Incorrect and missing AfterClass in HBase-tests Key: YARN-8348 URL: https://issues.apache.org/jira/browse/YARN-8348 Project: Hadoop YARN Issue Type: Bug Reporter: Giovanni Matteo Fumarola -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487775#comment-16487775 ] Íñigo Goiri edited comment on YARN-8336 at 5/23/18 6:54 PM: Both [TestLogsCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestLogsCLI/] and [TestSchedConfCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestSchedConfCLI/] pass. +1 Committing to trunk. was (Author: elgoiri): Both [TestLogsCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestLogsCLI/] and [TestSchedConfCLI|https://builds.apache.org/job/PreCommit-YARN-Build/20832/testReport/org.apache.hadoop.yarn.client.cli/TestSchedConfCLI/] pass. +1 Feel free to commit. > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8348: --- Attachment: YARN-8348.v1.patch > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola reassigned YARN-8348: -- Assignee: Giovanni Matteo Fumarola > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487848#comment-16487848 ] Giovanni Matteo Fumarola commented on YARN-8348: Before my patch: [ERROR] Errors: [ERROR] TestTimelineReaderWebServicesHBaseStorage.setupBeforeClass:79->AbstractTimelineReaderHBaseTestBase.setup:60 » NoClassDefFound [ERROR] org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps.org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps [ERROR] Run 1: TestHBaseTimelineStorageApps.setupBeforeClass:97 » NoClassDefFound org/apache/... [ERROR] Run 2: TestHBaseTimelineStorageApps.tearDownAfterClass:1939 NullPointer [INFO] [ERROR] TestHBaseTimelineStorageDomain.setupBeforeClass:51 » NoClassDefFound org/apach... [ERROR] org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities.org.apache.hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities [ERROR] Run 1: TestHBaseTimelineStorageEntities.setupBeforeClass:110 » NoClassDefFound org/ap... [ERROR] Run 2: TestHBaseTimelineStorageEntities.tearDownAfterClass:1882 NullPointer [INFO] [ERROR] TestHBaseTimelineStorageSchema.setupBeforeClass:49 » NoClassDefFound org/apach... [ERROR] org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity.org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity [ERROR] Run 1: TestHBaseStorageFlowActivity.setupBeforeClass:71 » NoClassDefFound org/apache/... [ERROR] Run 2: TestHBaseStorageFlowActivity.tearDownAfterClass:495 NullPointer [INFO] [ERROR] org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun.org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun [ERROR] Run 1: TestHBaseStorageFlowRun.setupBeforeClass:83 » NoClassDefFound org/apache/hadoo... [ERROR] Run 2: TestHBaseStorageFlowRun.tearDownAfterClass:1078 NullPointer [INFO] [ERROR] org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction.org.apache.hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction [ERROR] Run 1: TestHBaseStorageFlowRunCompaction.setupBeforeClass:82 » NoClassDefFound org/ap... [ERROR] Run 2: TestHBaseStorageFlowRunCompaction.tearDownAfterClass:853 NullPointer After my patch: [ERROR] Errors: [ERROR] TestTimelineReaderWebServicesHBaseStorage.setupBeforeClass:79->AbstractTimelineReaderHBaseTestBase.setup:60 » NoClassDefFound [ERROR] TestHBaseTimelineStorageApps.setupBeforeClass:97 » NoClassDefFound org/apache/... [ERROR] TestHBaseTimelineStorageDomain.setupBeforeClass:52 » NoClassDefFound org/apach... [ERROR] TestHBaseTimelineStorageEntities.setupBeforeClass:110 » NoClassDefFound org/ap... [ERROR] TestHBaseTimelineStorageSchema.setupBeforeClass:50 » NoClassDefFound org/apach... [ERROR] TestHBaseStorageFlowActivity.setupBeforeClass:71 » NoClassDefFound org/apache/... [ERROR] TestHBaseStorageFlowRun.setupBeforeClass:83 » NoClassDefFound org/apache/hadoo... [ERROR] TestHBaseStorageFlowRunCompaction.setupBeforeClass:82 » NoClassDefFound org/ap... > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8348: --- Description: HBase tests are failing in [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] for 2 reasons: * incorrect > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > > HBase tests are failing in > [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > for 2 reasons: > * incorrect -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-8348: --- Description: HBase tests are failing in [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] for 2 reasons: * incorrect afterClass; * not defined KeyProviderTokenIssuer. While in windows are failing for the previous 2 reasons plus * missing afterClass. This Jira tracks the effort to fix part of HBase-tests and reduces the failed tests in Linux. was: HBase tests are failing in [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] for 2 reasons: * incorrect > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > > HBase tests are failing in > [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > for 2 reasons: > * incorrect afterClass; > * not defined KeyProviderTokenIssuer. > While in windows are failing for the previous 2 reasons plus * missing > afterClass. > This Jira tracks the effort to fix part of HBase-tests and reduces the failed > tests in Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487861#comment-16487861 ] Giovanni Matteo Fumarola commented on YARN-8348: [^YARN-8348.v1.patch] will bring test failed from 21 to 16. [Link to failed tests|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > > HBase tests are failing in > [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > for 2 reasons: > * incorrect afterClass; > * not defined KeyProviderTokenIssuer. > While in windows are failing for the previous 2 reasons plus * missing > afterClass. > This Jira tracks the effort to fix part of HBase-tests and reduces the failed > tests in Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8346) Upgrading to 3.1 kills running containers with error "Opportunistic container queue is full"
[ https://issues.apache.org/jira/browse/YARN-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487887#comment-16487887 ] genericqa commented on YARN-8346: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 30s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 42s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 6s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 61m 51s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8346 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924799/YARN-8346.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 2a58b0d4306d 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 51ce02b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20842/testReport/ | | Max. process+thread count | 303 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20842/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Upgrading to 3.1 kills running containers with error "Opportunistic container > queue is full" > --
[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487886#comment-16487886 ] Íñigo Goiri commented on YARN-8348: --- Technically the null check in AfterClass shouldn't be needed as a failure in BeforeClass should trigger the error everywhere else. In any case, is good to not have a NPE if the BeforeClass fails. So in the output we went from a double NoClassDefFound+NPE to just NoClassDefFound. I think this is an improvement but we need to figure out the reason for the NoClassDefFound (probably a separate JIRA). The real fix here would be the one in TestHBaseTimelineStorageDomain which leaves the mini cluster open. [^YARN-8348.v1.patch] LGTM. Let's wait for Yetus. > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > > HBase tests are failing in > [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > for 2 reasons: > * incorrect afterClass; > * not defined KeyProviderTokenIssuer. > While in windows are failing for the previous 2 reasons plus * missing > afterClass. > This Jira tracks the effort to fix part of HBase-tests and reduces the failed > tests in Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8336) Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils
[ https://issues.apache.org/jira/browse/YARN-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487895#comment-16487895 ] Hudson commented on YARN-8336: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14272 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14272/]) YARN-8336. Fix potential connection leak in SchedConfCLI and (inigoiri: rev e30938af1270e079587e7bc06b755f9e93e660a5) * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/main/java/org/apache/hadoop/yarn/client/cli/SchedConfCLI.java * (edit) hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/util/YarnWebServiceUtils.java > Fix potential connection leak in SchedConfCLI and YarnWebServiceUtils > - > > Key: YARN-8336 > URL: https://issues.apache.org/jira/browse/YARN-8336 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Fix For: 3.2.0 > > Attachments: YARN-8336.v1.patch, YARN-8336.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487894#comment-16487894 ] genericqa commented on YARN-8344: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 20s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: The patch generated 0 new + 29 unchanged - 2 fixed = 29 total (was 31) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 18m 56s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 76m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:abb62dd | | JIRA Issue | YARN-8344 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12924796/YARN-8344.v2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux db23130e69a9 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 51ce02b | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/20841/testReport/ | | Max. process+thread count | 306 (vs. ulimit of 1) | | modules | C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/20841/console | | Powered by | Apache Yetus
[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.
[ https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487902#comment-16487902 ] Eric Payne commented on YARN-4781: -- bq. FairOrdering policy could be used with weights? [~sunilg], the fair ordering preemption will generally select the smaller-weigted users first even when those containers are older. It's a hierarchy of priority ordering, though, and it does still try to be "fair," so you could have a situation where the youngest containers are selected even though they are owned by a more heavily-weighted user. > Support intra-queue preemption for fairness ordering policy. > > > Key: YARN-4781 > URL: https://issues.apache.org/jira/browse/YARN-4781 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Wangda Tan >Assignee: Eric Payne >Priority: Major > Attachments: YARN-4781.001.patch, YARN-4781.002.patch, > YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch > > > We introduced fairness queue policy since YARN-3319, which will let large > applications make progresses and not starve small applications. However, if a > large application takes the queue’s resources, and containers of the large > app has long lifespan, small applications could still wait for resources for > long time and SLAs cannot be guaranteed. > Instead of wait for application release resources on their own, we need to > preempt resources of queue with fairness policy enabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8334) [] Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8334: --- Summary: [] Fix potential connection leak in GPGUtils (was: Fix potential connection leak in GPGUtils) > [] Fix potential connection leak in GPGUtils > > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8334) [GPG] Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang updated YARN-8334: --- Summary: [GPG] Fix potential connection leak in GPGUtils (was: [] Fix potential connection leak in GPGUtils) > [GPG] Fix potential connection leak in GPGUtils > --- > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Reopened] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Badger reopened YARN-7530: --- This change breaks branch-3.1 compilation if the .m2 directory is cleaned. {noformat} [ERROR] [ERROR] Some problems were encountered while processing the POMs: [WARNING] 'parent.relativePath' of POM org.apache.hadoop:hadoop-yarn-services-api:[unknown-version] (/Users/ebadger/apachehadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml) points at org.apache.hadoop:hadoop-yarn-services instead of org.apache.hadoop:hadoop-yarn-applications, please verify your project structure @ line 19, column 11 [FATAL] Non-resolvable parent POM for org.apache.hadoop:hadoop-yarn-services-api:[unknown-version]: Could not find artifact org.apache.hadoop:hadoop-yarn-applications:pom:3.1.1-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 19, column 11 [WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-gpg-plugin is missing. @ line 133, column 15 @ [ERROR] The build could not read 1 project -> [Help 1] [ERROR] [ERROR] The project org.apache.hadoop:hadoop-yarn-services-api:[unknown-version] (/Users/ebadger/apachehadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml) has 1 error [ERROR] Non-resolvable parent POM for org.apache.hadoop:hadoop-yarn-services-api:[unknown-version]: Could not find artifact org.apache.hadoop:hadoop-yarn-applications:pom:3.1.1-SNAPSHOT and 'parent.relativePath' points at wrong local POM @ line 19, column 11 -> [Help 2] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/ProjectBuildingException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/UnresolvableModelException {noformat} Here's the difference between branch-3.1 and trunk. The artifactId was updated correctly in trunk, but not branch-3.1 {noformat} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml index 45168a9fbc4..d45da093102 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-services/hadoop-yarn-services-api/pom.xml @@ -18,8 +18,8 @@ 4.0.0 org.apache.hadoop -hadoop-yarn-services -3.2.0-SNAPSHOT +hadoop-yarn-applications +3.1.1-SNAPSHOT hadoop-yarn-services-api Apache Hadoop YARN Services API {noformat} > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Trivial > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7530.001.patch, YARN-7530.002.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8334) [GPG] Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487945#comment-16487945 ] Botong Huang commented on YARN-8334: Committed to YARN-7402 as db183f2ea. Thanks [~giovanni.fumarola] for the patch and [~elgoiri] for the review! > [GPG] Fix potential connection leak in GPGUtils > --- > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Resolved] (YARN-8334) [GPG] Fix potential connection leak in GPGUtils
[ https://issues.apache.org/jira/browse/YARN-8334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Botong Huang resolved YARN-8334. Resolution: Fixed > [GPG] Fix potential connection leak in GPGUtils > --- > > Key: YARN-8334 > URL: https://issues.apache.org/jira/browse/YARN-8334 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Minor > Attachments: YARN-8334-YARN-7402.v1.patch, > YARN-8334-YARN-7402.v2.patch > > > Missing ClientResponse.close and Client.destroy can lead to a connection leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-8349) Remove YARN registry entries when a service is killed by the RM
Shane Kumpf created YARN-8349: - Summary: Remove YARN registry entries when a service is killed by the RM Key: YARN-8349 URL: https://issues.apache.org/jira/browse/YARN-8349 Project: Hadoop YARN Issue Type: Sub-task Components: yarn-native-services Affects Versions: 3.2.0, 3.1.1 Reporter: Shane Kumpf As the title states, when a service is killed by the RM (for exceeding its lifetime for example), the YARN registry entries should be cleaned up. Without cleanup, DNS can contain multiple hostnames for a single IP address in the case where IPs are reused. This impacts reverse lookups, which breaks services, such as kerberos, that depend on those lookups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8349) Remove YARN registry entries when a service is killed by the RM
[ https://issues.apache.org/jira/browse/YARN-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shane Kumpf reassigned YARN-8349: - Assignee: Billie Rinaldi > Remove YARN registry entries when a service is killed by the RM > --- > > Key: YARN-8349 > URL: https://issues.apache.org/jira/browse/YARN-8349 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.2.0, 3.1.1 >Reporter: Shane Kumpf >Assignee: Billie Rinaldi >Priority: Major > > As the title states, when a service is killed by the RM (for exceeding its > lifetime for example), the YARN registry entries should be cleaned up. > Without cleanup, DNS can contain multiple hostnames for a single IP address > in the case where IPs are reused. This impacts reverse lookups, which breaks > services, such as kerberos, that depend on those lookups. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487951#comment-16487951 ] genericqa commented on YARN-8348: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 7 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 7s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 9m 45s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Skipped patched modules with no Java source: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase-tests {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 27s{color} | {color:red} hadoop-yarn-server-timelineservice-hbase-tests in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 46m 1s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageEntities | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageSchema | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageApps | | | hadoop.yarn.server.timelineservice.reader.TestTimelineReaderWebServicesHBaseStorage | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowActivity | | | hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineStorageDomain | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRunCompaction | | | hadoop.yarn.server.timelineservice.storage.flow.TestHBaseStorageFlowRun | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05
[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487952#comment-16487952 ] Eric Yang commented on YARN-8342: - [~vinodkv] Launch command was dropped in YARN-7516 due to concerns of shell expansion to cause the commands to run as root user via popen. With YARN-7654 changes to use execvp, this concern has been nullified. It is safe to preserve launch command even for untrusted images. [~shaneku...@gmail.com] [~ebadger] [~jlowe] Do you agree with this change? > Using docker image from a non-privileged registry, the launch_command is not > honored > > > Key: YARN-8342 > URL: https://issues.apache.org/jira/browse/YARN-8342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Priority: Critical > Labels: Docker > > During test of the Docker feature, I found that if a container comes from > non-privileged docker registry, the specified launch command will be ignored. > Container will success without any log, which is very confusing to end users. > And this behavior is inconsistent to containers from privileged docker > registries. > cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Assigned] (YARN-8333) Load balance YARN services using RegistryDNS multiple A records
[ https://issues.apache.org/jira/browse/YARN-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Yang reassigned YARN-8333: --- Assignee: Eric Yang > Load balance YARN services using RegistryDNS multiple A records > --- > > Key: YARN-8333 > URL: https://issues.apache.org/jira/browse/YARN-8333 > Project: Hadoop YARN > Issue Type: Improvement > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Eric Yang >Priority: Major > > For scaling stateless containers, it would be great to support DNS round > robin for fault tolerance and load balancing. The current DNS record format > for RegistryDNS is > [container-instance].[application-name].[username].[domain]. For example: > {code} > appcatalog-0.appname.hbase.ycluster. IN A 123.123.123.120 > appcatalog-1.appname.hbase.ycluster. IN A 123.123.123.121 > appcatalog-2.appname.hbase.ycluster. IN A 123.123.123.122 > appcatalog-3.appname.hbase.ycluster. IN A 123.123.123.123 > {code} > It would be nice to add multi-A record that contains all IP addresses of the > same component in addition to the instance based records. For example: > {code} > appcatalog.appname.hbase.ycluster. IN A 123.123.123.120 > appcatalog.appname.hbase.ycluster. IN A 123.123.123.121 > appcatalog.appname.hbase.ycluster. IN A 123.123.123.122 > appcatalog.appname.hbase.ycluster. IN A 123.123.123.123 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487984#comment-16487984 ] Íñigo Goiri commented on YARN-8348: --- Good news is that the NPE is gone. However, the original NoClassDefFoundError surfaces clarly now. I'm fine committing this as is but I'd like to have a follow-up JIRA on why KeyProviderTokenIssuer is not found. > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > > HBase tests are failing in > [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > for 2 reasons: > * incorrect afterClass; > * not defined KeyProviderTokenIssuer. > While in windows are failing for the previous 2 reasons plus * missing > afterClass. > This Jira tracks the effort to fix part of HBase-tests and reduces the failed > tests in Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8344) Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync
[ https://issues.apache.org/jira/browse/YARN-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487987#comment-16487987 ] Íñigo Goiri commented on YARN-8344: --- +1 on [^YARN-8344.v2.patch]. We still need to figure out the proper fix for the path length issue on Windows. [~giovanni.fumarola], please link this JIRA once opening the Windows fix. > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > - > > Key: YARN-8344 > URL: https://issues.apache.org/jira/browse/YARN-8344 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8344.v1.patch, YARN-8344.v2.patch > > > Missing nm.close() in TestNodeManagerResync to fix testKillContainersOnResync > on Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8342) Using docker image from a non-privileged registry, the launch_command is not honored
[ https://issues.apache.org/jira/browse/YARN-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487992#comment-16487992 ] Shane Kumpf commented on YARN-8342: --- This sounds like a reasonable proposal. In cases where the current behavior is desired, the user can set "launch_command" to an empty string I guess? To be clear, there is no replacement with an "empty bash". The current "untrusted" mode leaves it up to the Docker image to specify the ENTRYPOINT/CMD. Nothing is overwritten by YARN in this "untrusted" mode. It is very common for images to use "bash" as the CMD. When an image does this and YARN runs in this "untrusted" mode, a non-interactive "bash" shell starts in the container and immediately exits with success. YARN reports that the container ran successfully, but this is confusing to the user because the code they expected to run did not run. The launch script depends on mounts and "untrusted" mode strips all mounts, meaning we flat out can't use a launch_script in this mode as we would in "trusted" mode. Allowing the "launch_command" supplied by the user, without embedding that "launch_command" in the launch script seems like a viable way to support both. Confused yet? :) > Using docker image from a non-privileged registry, the launch_command is not > honored > > > Key: YARN-8342 > URL: https://issues.apache.org/jira/browse/YARN-8342 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Priority: Critical > Labels: Docker > > During test of the Docker feature, I found that if a container comes from > non-privileged docker registry, the specified launch command will be ignored. > Container will success without any log, which is very confusing to end users. > And this behavior is inconsistent to containers from privileged docker > registries. > cc: [~eyang], [~shaneku...@gmail.com], [~ebadger], [~jlowe] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8348) Incorrect and missing AfterClass in HBase-tests
[ https://issues.apache.org/jira/browse/YARN-8348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488002#comment-16488002 ] Giovanni Matteo Fumarola commented on YARN-8348: Thanks [~elgoiri] for the review. I will open a follow-up Jira for KeyProviderTokenIssuer. As I said before, this patch will bring the failed test from 21 to 16 in Linux. As HDFS-13558 that you and [~huanbang1993] fixed by closing the cluster, the patch will fix failures in Windows for TestHBaseTimelineStorageDomain and TestHBaseTimelineStorageSchema. > Incorrect and missing AfterClass in HBase-tests > --- > > Key: YARN-8348 > URL: https://issues.apache.org/jira/browse/YARN-8348 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola >Priority: Major > Attachments: YARN-8348.v1.patch > > > HBase tests are failing in > [linux|https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86/789/testReport/] > for 2 reasons: > * incorrect afterClass; > * not defined KeyProviderTokenIssuer. > While in windows are failing for the previous 2 reasons plus * missing > afterClass. > This Jira tracks the effort to fix part of HBase-tests and reduces the failed > tests in Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8326) Yarn 3.0 seems runs slower than Yarn 2.6
[ https://issues.apache.org/jira/browse/YARN-8326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488013#comment-16488013 ] Hsin-Liang Huang commented on YARN-8326: Hi [~eyang] Here is another update. Even though the simple job that I ran with the suggested setting changed, the performance was improved. However, I ran our unit testcases, and it still ran 14 hours compared to 7 hours in 2.6 environment. I also ran another sample job, with the changed settings, it still ran 15 seconds compared to 6 or 7 seconds in 2.6 environment. So I think even though monitoring setting might affect the performance issue, but it only plays a little part, the major issue could still be in the exiting container that in 3.0 environment is much slower than 2.6 environment. Is there anyone looking into this area? Thanks! > Yarn 3.0 seems runs slower than Yarn 2.6 > > > Key: YARN-8326 > URL: https://issues.apache.org/jira/browse/YARN-8326 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 3.0.0 > Environment: This is the yarn-site.xml for 3.0. > > > > hadoop.registry.dns.bind-port > 5353 > > > hadoop.registry.dns.domain-name > hwx.site > > > hadoop.registry.dns.enabled > true > > > hadoop.registry.dns.zone-mask > 255.255.255.0 > > > hadoop.registry.dns.zone-subnet > 172.17.0.0 > > > manage.include.files > false > > > yarn.acl.enable > false > > > yarn.admin.acl > yarn > > > yarn.client.nodemanager-connect.max-wait-ms > 6 > > > yarn.client.nodemanager-connect.retry-interval-ms > 1 > > > yarn.http.policy > HTTP_ONLY > > > yarn.log-aggregation-enable > false > > > yarn.log-aggregation.retain-seconds > 2592000 > > > yarn.log.server.url > > [http://xx:19888/jobhistory/logs|http://whiny2.fyre.ibm.com:19888/jobhistory/logs] > > > yarn.log.server.web-service.url > > [http://xx:8188/ws/v1/applicationhistory|http://whiny2.fyre.ibm.com:8188/ws/v1/applicationhistory] > > > yarn.node-labels.enabled > false > > > yarn.node-labels.fs-store.retry-policy-spec > 2000, 500 > > > yarn.node-labels.fs-store.root-dir > /system/yarn/node-labels > > > yarn.nodemanager.address > 0.0.0.0:45454 > > > yarn.nodemanager.admin-env > MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX > > > yarn.nodemanager.aux-services > mapreduce_shuffle,spark2_shuffle,timeline_collector > > > yarn.nodemanager.aux-services.mapreduce_shuffle.class > org.apache.hadoop.mapred.ShuffleHandler > > > yarn.nodemanager.aux-services.spark2_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.spark2_shuffle.classpath > /usr/spark2/aux/* > > > yarn.nodemanager.aux-services.spark_shuffle.class > org.apache.spark.network.yarn.YarnShuffleService > > > yarn.nodemanager.aux-services.timeline_collector.class > > org.apache.hadoop.yarn.server.timelineservice.collector.PerNodeTimelineCollectorsAuxService > > > yarn.nodemanager.bind-host > 0.0.0.0 > > > yarn.nodemanager.container-executor.class > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > > > yarn.nodemanager.container-metrics.unregister-delay-ms > 6 > > > yarn.nodemanager.container-monitor.interval-ms > 3000 > > > yarn.nodemanager.delete.debug-delay-sec > 0 > > > > yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage > 90 > > > yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb > 1000 > > > yarn.nodemanager.disk-health-checker.min-healthy-disks > 0.25 > > > yarn.nodemanager.health-checker.interval-ms > 135000 > > > yarn.nodemanager.health-checker.script.timeout-ms > 6 > > > > yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage > false > > > yarn.nodemanager.linux-container-executor.group > hadoop > > > > yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users > false > > > yarn.nodemanager.local-dirs > /hadoop/yarn/local > > > yarn.nodemanager.log-aggregation.compression-type > gz > > > yarn.nodemanager.log-aggregation.debug-enabled > false > > > yarn.nodemanager.log-aggregation.num-log-files-per-app > 30 > > > > yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds > 3600 > > > yarn.nodemanager.log-dirs > /hadoop/yarn/log > > > yarn.nodemanager.log.retain-seconds > 604800 > > > yarn.nodemanager.pmem-check-enabled > false > > > yarn.nodemanager.recovery.dir > /var/log/hadoop-yarn/nodemanager/recovery-state > > > yarn.nodemanager.recovery.enabled > true > > > yarn.nodemanager.recovery.supervised > true > > > yarn.nodemanager.remote-app-log-dir > /app-logs > > > yarn
[jira] [Updated] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-7530: - Fix Version/s: (was: 3.2.0) > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7530.001.patch, YARN-7530.002.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-7530) hadoop-yarn-services-api should be part of hadoop-yarn-services
[ https://issues.apache.org/jira/browse/YARN-7530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-7530: - Priority: Blocker (was: Trivial) > hadoop-yarn-services-api should be part of hadoop-yarn-services > --- > > Key: YARN-7530 > URL: https://issues.apache.org/jira/browse/YARN-7530 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn-native-services >Affects Versions: 3.1.0 >Reporter: Eric Yang >Assignee: Chandni Singh >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-7530.001.patch, YARN-7530.002.patch > > > Hadoop-yarn-services-api is currently a parallel project to > hadoop-yarn-services project. It would be better if hadoop-yarn-services-api > is part of hadoop-yarn-services for correctness. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org