[jira] [Commented] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
[ https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253434#comment-15253434 ] Sunil G commented on YARN-4947: --- Thanks [~rohithsharma], I am very much agree with the approach to fix these test case than adding complicated flags and overrides. I am not seeing much of a problem if we start the auxiliary services. > Test timeout is happening for TestRMWebServicesNodes > > > Key: YARN-4947 > URL: https://issues.apache.org/jira/browse/YARN-4947 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4947.patch, 0002-YARN-4947.patch > > > Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 > [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253414#comment-15253414 ] Bibin A Chundatt commented on YARN-4846: Test failures are Tracked as part for HADOOP-13049. > Random failures for > TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers > > > Key: YARN-4846 > URL: https://issues.apache.org/jira/browse/YARN-4846 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, > 0003-YARN-4846.patch, 0004-YARN-4846.patch, YARN-4846-update-PCPP.patch > > > {noformat} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253373#comment-15253373 ] Hadoop QA commented on YARN-4846: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 7s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 163m 9s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification | | | hadoop.yarn.server.resourcemanager.TestApplicationCleanup | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | |
[jira] [Commented] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253368#comment-15253368 ] Hadoop QA commented on YARN-4676: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 8 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 54s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 54s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 55s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped branch modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 21s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 11s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 57s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 50s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 6m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s {color} | {color:green} root: patch generated 0 new + 520 unchanged - 6 fixed = 520 total (was 526) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 4m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s {color} | {color:blue} Skipped patch modules with no Java source: hadoop-project hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 8m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 35s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 6m 34s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 8s {color} | {c
[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253356#comment-15253356 ] Hadoop QA commented on YARN-1297: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 0 new + 142 unchanged - 2 fixed = 142 total (was 144) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 32s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 164m 3s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLab
[jira] [Commented] (YARN-4947) Test timeout is happening for TestRMWebServicesNodes
[ https://issues.apache.org/jira/browse/YARN-4947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253348#comment-15253348 ] Rohith Sharma K S commented on YARN-4947: - Practically NodeManagers are not allowed to register to RM without ResourceTracerService is started. So I feel the test case should change such that start the RM instead of only doing init. Agree that webservice unit test does not require to start service, but since NodeManagers are registering service start should happen ideally. > Test timeout is happening for TestRMWebServicesNodes > > > Key: YARN-4947 > URL: https://issues.apache.org/jira/browse/YARN-4947 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4947.patch, 0002-YARN-4947.patch > > > Testcase timeout for TestRMWebServicesNodes is happening after YARN-4893 > [timeout|https://builds.apache.org/job/PreCommit-YARN-Build/11044/testReport/] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253307#comment-15253307 ] Hadoop QA commented on YARN-4390: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 12 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 34s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 28 new + 506 unchanged - 15 fixed = 534 total (was 521) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 57s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 166m 2s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestWorkPreservingRMRestartForNodeLabel | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | | | hadoop.yarn.server.re
[jira] [Commented] (YARN-3215) Respect labels in CapacityScheduler when computing headroom
[ https://issues.apache.org/jira/browse/YARN-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253277#comment-15253277 ] Hadoop QA commented on YARN-3215: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 8m 44s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 19s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} branch-2.8 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} branch-2.8 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} branch-2.8 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 20s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 217 unchanged - 6 fixed = 218 total (was 223) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 40s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 81m 26s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 192m 0s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | JDK v1.7.0
[jira] [Updated] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4983: Attachment: YARN-4983-trunk.001.patch The tons of UT failures are caused by a wrong jvm metrics API call. Calling initSingleton instead of create to avoid registering twice to the same metrics system. > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch, YARN-4983-trunk.001.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3971: Hadoop Flags: (was: Reviewed) > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, > 0005-YARN-3971.001.addendum.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3971: Target Version/s: 2.9.0 > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, > 0005-YARN-3971.001.addendum.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery
[ https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-3971: Fix Version/s: (was: 2.8.0) > Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel > recovery > -- > > Key: YARN-3971 > URL: https://issues.apache.org/jira/browse/YARN-3971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Critical > Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, > 0003-YARN-3971.patch, 0004-YARN-3971.patch, > 0005-YARN-3971.001.addendum.patch, 0005-YARN-3971.addendum.patch, > 0005-YARN-3971.patch > > > Steps to reproduce > # Create label x,y > # Delete label x,y > # Create label x,y add capacity scheduler xml for labels x and y too > # Restart RM > > Both RM will become Standby. > Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}} > {code} > 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: > Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in > state STARTED; cause: java.io.IOException: Cannot remove label=x, because > queue=a1 is using this label. Please remove label on queue before remove the > label > java.io.IOException: Cannot remove label=x, because queue=a1 is using this > label. Please remove label on queue before remove the label > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104) > at > org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118) > at > org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232) > at > org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001) > at > org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312) > at > org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126) > at > org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832) > at > org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422) > at > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599) > at > org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4981) limit the max containers to assign to one application per heartbeart
[ https://issues.apache.org/jira/browse/YARN-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253193#comment-15253193 ] ChenFolin commented on YARN-4981: - I set mapreduce.map|reduce.cpu.vcores=1. As we know vcores are not strictly limit the cpu. So if i set maxAssign = 10 ,and If the job is CPU-intensive, it may lead to one node run 10 YarnChild processes that service for one application , and it may run cpu 100%. If i limit to one application per heartbeat, i can make it more average. Such as : application has 100 tasks, and 10 tasks can run cpu 100%, if do not limit to one application per heartbeat, 10 node may run cpu 100%. if limit to one application per heartbeat only 4 , it will use 25 nodes, and every just run cpu 40%. I think it may be more efficient for job. > limit the max containers to assign to one application per heartbeart > > > Key: YARN-4981 > URL: https://issues.apache.org/jira/browse/YARN-4981 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.5.0, 2.6.4 >Reporter: ChenFolin > Attachments: YARN-4981.patch > > > When use assignMultiple,and such as set maxAssign=10: > if a job is high CPU util or high CPU load, it may lead to a part of nodes > very busy. and the job may cost a long time. > so , i want to limit the max containers to assign to one application per > heartbeart, it may help to nodes use more uniform. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253188#comment-15253188 ] Hadoop QA commented on YARN-4983: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 1s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 49s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 7s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 43s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 43s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 4s {color} | {color:red} root: patch generated 2 new + 172 unchanged - 1 fixed = 174 total (was 173) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 43s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 27s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 34s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 22s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color}
[jira] [Assigned] (YARN-1134) Add support for zipping/unzipping logs while in transit for the NM logs web-service
[ https://issues.apache.org/jira/browse/YARN-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1134: --- Assignee: Xuan Gong > Add support for zipping/unzipping logs while in transit for the NM logs > web-service > --- > > Key: YARN-1134 > URL: https://issues.apache.org/jira/browse/YARN-1134 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Vinod Kumar Vavilapalli >Assignee: Xuan Gong > > As [~zjshen] pointed out at > [YARN-649|https://issues.apache.org/jira/browse/YARN-649?focusedCommentId=13698415&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13698415], > {quote} > For the long running applications, they may have a big log file, such that it > will take a long time to download the log file via the RESTful API. > Consequently, HTTP connection may timeout before downloading before > downloading a complete log file. Maybe it is good to zip the log file before > sending it, and unzip it after receiving it. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4846: --- Attachment: 0004-YARN-4846.patch Uploading patch after fixing checkstyle issue > Random failures for > TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers > > > Key: YARN-4846 > URL: https://issues.apache.org/jira/browse/YARN-4846 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, > 0003-YARN-4846.patch, 0004-YARN-4846.patch, YARN-4846-update-PCPP.patch > > > {noformat} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4982) Test timeout :TestAMRMProxy testcase timeout always
[ https://issues.apache.org/jira/browse/YARN-4982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4982: --- Issue Type: Sub-task (was: Bug) Parent: YARN-4478 > Test timeout :TestAMRMProxy testcase timeout always > --- > > Key: YARN-4982 > URL: https://issues.apache.org/jira/browse/YARN-4982 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt > > https://builds.apache.org/job/PreCommit-YARN-Build/11088/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestAMRMProxy/testAMRMProxyE2E/ > In hadoop-yarn-client package test {{TestAMRMProxy}} testcase timeout always > {noformat} > java.lang.Exception: test timed out after 6 milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:160) > at com.sun.proxy.$Proxy85.getNewApplication(Unknown Source) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:227) > at > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:235) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMProxy.createApp(TestAMRMProxy.java:367) > at > org.apache.hadoop.yarn.client.api.impl.TestAMRMProxy.testAMRMProxyE2E(TestAMRMProxy.java:110) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4919) Yarn logs should support a option to output logs as compressed archive
[ https://issues.apache.org/jira/browse/YARN-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved YARN-4919. --- Resolution: Duplicate This as a dup of YARN-1134, closing so. > Yarn logs should support a option to output logs as compressed archive > -- > > Key: YARN-4919 > URL: https://issues.apache.org/jira/browse/YARN-4919 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Xuan Gong > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4577) Enable aux services to have their own custom classpath/jar file
[ https://issues.apache.org/jira/browse/YARN-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253112#comment-15253112 ] Sangjin Lee commented on YARN-4577: --- Yes I think the POC patch is pretty close to what I had in mind too. A couple of more minor suggestions: - I probably wouldn't make {{AuxServiceWithCustomClassLoader}} public. It should be really visible only to {{AuxServices}}. Package scope should be fine. - I understand {{callWithCustomClassLoader()}} is bit complicated because it has to support methods with different signatures. I would simply inline the code (as you are doing with {{service*()}} methods). Then you don't have to do any reflection business to do this. Don't forget to test it with a real-life use case! Thanks. > Enable aux services to have their own custom classpath/jar file > --- > > Key: YARN-4577 > URL: https://issues.apache.org/jira/browse/YARN-4577 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-4577.1.patch, YARN-4577.2.patch, > YARN-4577.20160119.1.patch, YARN-4577.20160204.patch, YARN-4577.3.patch, > YARN-4577.3.rebase.patch, YARN-4577.4.patch, YARN-4577.poc.patch > > > Right now, users have to add their jars to the NM classpath directly, thus > put them on the system classloader. But if multiple versions of the plugin > are present on the classpath, there is no control over which version actually > gets loaded. Or if there are any conflicts between the dependencies > introduced by the auxiliary service and the NM itself, they can break the NM, > the auxiliary service, or both. > The solution could be: to instantiate aux services using a classloader that > is different from the system classloader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully
[ https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4987: Priority: Minor (was: Major) > EntityGroupFS timeline store needs to handle null storage gracefully > > > Key: YARN-4987 > URL: https://issues.apache.org/jira/browse/YARN-4987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu >Priority: Minor > > To handle concurrency issues, key value based timeline storage may return > null on reads that are concurrent to service stop. EntityGroupFS timeline > store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4766) NM should not aggregate logs older than the retention policy
[ https://issues.apache.org/jira/browse/YARN-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253086#comment-15253086 ] Robert Kanter commented on YARN-4766: - Looks good overall. Here's a few things: # {{AggregatedLogFormat#getPendingLogFilesToUploadForThisContainer}} should be marked {{@VisibleForTesting}} # It's typically easier to maintain multiple constructors when you have the one with the most arguments do the "real" work and the others just call others with default values in arguments. Can you make {{ApplicationImpl}} do that? # In {{ApplicationImpl#buildAppProto}}, it's catching an {{IOException}} (which shouldn't occur) and simply logging it. If this does somehow occur, then it's going to continue executing, which is probably bad. Given that the only caller of this method is already doing it from a try-catch block, I think we'd be better off throwing the {{IOException}}. Also, the caller should log the {{Exception}} so that we get a stack trace. # There's an extra newline above {{ApplicationImpl#ApplicationImpl}} # {{AppLogAggregatorImpl#uploadLogsForContainers}} creates a new array named {{paths}} that's never used. # In {{TestAppLogAggregatorImpl}}, {{testAggregatorWithRetentionPolicyDisabled_shouldUploadAllFiles}} and {{testAggregatorWhenNoFileOlderThanRetentionPolicy_ShouldUploadAll}} are identical other than a config property or two. Can we make a helper than has most of the code and pass it the config properties so we can combine the code here? # There's an extra newline in the {{DeletionServiceDeleteTaskProto}} proto message > NM should not aggregate logs older than the retention policy > > > Key: YARN-4766 > URL: https://issues.apache.org/jira/browse/YARN-4766 > Project: Hadoop YARN > Issue Type: Improvement > Components: log-aggregation, nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: yarn4766.001.patch, yarn4766.002.patch, > yarn4766.003.patch > > > When a log aggregation fails on the NM the information is for the attempt is > kept in the recovery DB. Log aggregation can fail for multiple reasons which > are often related to HDFS space or permissions. > On restart the recovery DB is read and if an application attempt needs its > logs aggregated, the files are scheduled for aggregation without any checks. > The log files could be older than the retention limit in which case we should > not aggregate them but immediately mark them for deletion from the local file > system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253084#comment-15253084 ] Hitesh Shah commented on YARN-4844: --- bq. It is not a very hard thing to drop it, we'd better to do it close to first branch-3 release. I believe a recent comment on the mailing list was trying to target a 3.0 release within the next few weeks so I guess that means we make this change now? > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-4090) Make Collections.sort() more efficient in FSParentQueue.java
[ https://issues.apache.org/jira/browse/YARN-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu reopened YARN-4090: > Make Collections.sort() more efficient in FSParentQueue.java > > > Key: YARN-4090 > URL: https://issues.apache.org/jira/browse/YARN-4090 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Xianyin Xin >Assignee: Xianyin Xin > Attachments: YARN-4090-TestResult.pdf, YARN-4090-preview.patch, > sampling1.jpg, sampling2.jpg > > > Collections.sort() consumes too much time in a scheduling round. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yufei Gu updated YARN-1297: --- Attachment: YARN-1297.006.patch > Miscellaneous Fair Scheduler speedups > - > > Key: YARN-1297 > URL: https://issues.apache.org/jira/browse/YARN-1297 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sandy Ryza >Assignee: Yufei Gu > Attachments: YARN-1297-1.patch, YARN-1297-2.patch, > YARN-1297.005.patch, YARN-1297.006.patch, YARN-1297.3.patch, > YARN-1297.4.patch, YARN-1297.4.patch, YARN-1297.patch, YARN-1297.patch > > > I ran the Fair Scheduler's core scheduling loop through a profiler tool and > identified a bunch of minimally invasive changes that can shave off a few > milliseconds. > The main one is demoting a couple INFO log messages to DEBUG, which brought > my benchmark down from 16000 ms to 6000. > A few others (which had way less of an impact) were > * Most of the time in comparisons was being spent in Math.signum. I switched > this to direct ifs and elses and it halved the percent of time spent in > comparisons. > * I removed some unnecessary instantiations of Resource objects > * I made it so that queues' usage wasn't calculated from the applications up > each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253058#comment-15253058 ] Wangda Tan commented on YARN-4846: -- Looks good to me, thanks! Will commit the patch tomorrow. > Random failures for > TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers > > > Key: YARN-4846 > URL: https://issues.apache.org/jira/browse/YARN-4846 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, > 0003-YARN-4846.patch, YARN-4846-update-PCPP.patch > > > {noformat} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4844: - Attachment: YARN-4844.2.patch Rebased to latest trunk. (ver. 2) > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch, YARN-4844.2.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253048#comment-15253048 ] Wangda Tan commented on YARN-4844: -- [~hitesh], Agree it is very messy but it seems there's no other way to do it. :( I also tried to add new Resource object (like YarnServerResource) which will be used by YARN internally only and keeps user-facing API clean. But there're too many interactions between application and services use api.Resource, we need to handle all these cases separately, it doesn't look like a doable plan to me. For branch-3 release, I prefer to keep {{long getMemory/etc}} only. However, since getMemory is used 1000+ times inside resource manager project, if we drop the {{Resource#int getMemory}} now from trunk, we need to write two versions of patches for almost RM fixes, which will be a HUGE headache for YARN RM contributors. It is not a very hard thing to drop it, we'd better to do it close to first branch-3 release. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253042#comment-15253042 ] Yufei Gu commented on YARN-1297: I agree. Let's me upload a new patch soon and reopen the YARN-4090. > Miscellaneous Fair Scheduler speedups > - > > Key: YARN-1297 > URL: https://issues.apache.org/jira/browse/YARN-1297 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sandy Ryza >Assignee: Yufei Gu > Attachments: YARN-1297-1.patch, YARN-1297-2.patch, > YARN-1297.005.patch, YARN-1297.3.patch, YARN-1297.4.patch, YARN-1297.4.patch, > YARN-1297.patch, YARN-1297.patch > > > I ran the Fair Scheduler's core scheduling loop through a profiler tool and > identified a bunch of minimally invasive changes that can shave off a few > milliseconds. > The main one is demoting a couple INFO log messages to DEBUG, which brought > my benchmark down from 16000 ms to 6000. > A few others (which had way less of an impact) were > * Most of the time in comparisons was being spent in Math.signum. I switched > this to direct ifs and elses and it halved the percent of time spent in > comparisons. > * I removed some unnecessary instantiations of Resource objects > * I made it so that queues' usage wasn't calculated from the applications up > each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully
[ https://issues.apache.org/jira/browse/YARN-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4987: Issue Type: Sub-task (was: Bug) Parent: YARN-4233 > EntityGroupFS timeline store needs to handle null storage gracefully > > > Key: YARN-4987 > URL: https://issues.apache.org/jira/browse/YARN-4987 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Li Lu >Assignee: Li Lu > > To handle concurrency issues, key value based timeline storage may return > null on reads that are concurrent to service stop. EntityGroupFS timeline > store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4987) EntityGroupFS timeline store needs to handle null storage gracefully
Li Lu created YARN-4987: --- Summary: EntityGroupFS timeline store needs to handle null storage gracefully Key: YARN-4987 URL: https://issues.apache.org/jira/browse/YARN-4987 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu To handle concurrency issues, key value based timeline storage may return null on reads that are concurrent to service stop. EntityGroupFS timeline store needs to handle this case gracefully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4390: - Attachment: YARN-4390.4.patch > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch, YARN-4390.4.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253032#comment-15253032 ] Wangda Tan commented on YARN-4390: -- [~eepayne], Thanks for review! bq. I think this JIRA gets us closer to that goal, but there may be a possibility for the killed container to go someplace else. Is that right? Yes it is true, and the reservation should happen before we can correctly preempt resources for large containers. For example, if YARN-4280 occurs, we cannot reserve container and preempt containers correctly. Done most of your comments, except: bq. Even though killableContainers is an unmodifiableMap, I think it can still change, can't it? Yes, It can change. And actually, all existing preemption logic assume change could happen: - In micro view: we clone queue metrics at the beginning of editSchedule, but queue metrics can be changed during preemption logic. - In macro view: selected candidates could become invalid / valid back-and-forth before max-kill-wait reaches. (Since queue's resource usage could be updated in the period of max-kill-wait). So back to your question, if killableContainers modified during editSchedule. We can fix it in next (and next-next ..) editSchedule. bq. I am a little concerned about calling preemptionContext.getScheduler().getAllNodes()) to get the list of all of the nodes on every iteration of the preemption monitor... This is a valid concern. However, as far as I know, Fair scheduler is using the method when doing async scheduling, and async scheduling is widely used by Fair Scheduler users: See logic: {code} void continuousSchedulingAttempt() throws InterruptedException { long start = getClock().getTime(); List nodeIdList = nodeTracker.sortedNodeList(nodeAvailableResourceComparator); // iterate all nodes for (FSSchedulerNode node : nodeIdList) { try { if (Resources.fitsIn(minimumAllocation, node.getUnallocatedResource())) { attemptScheduling(node); } } } {code} I didn't see any JIRA to complain about performane impact regarding to this approach. And since it uses R/W lock, write lock will be acquired only if node add / move or node resource update. So in most cases, nobody acquires write lock. I agree to cache node list inside PCPP if we do see performance issues. Attaching ver.4 patch, please kindly review. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4962) support filling up containers on node one by one
[ https://issues.apache.org/jira/browse/YARN-4962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253030#comment-15253030 ] sandflee commented on YARN-4962: Thanks [~templedf], node labels seems couldn't solve our problem, because: 1, our nodes have the same resource mostly. It's hard to label A/B 2, type b job couldn't run on type A node, It's a waste of resource when type A node are free. > support filling up containers on node one by one > - > > Key: YARN-4962 > URL: https://issues.apache.org/jira/browse/YARN-4962 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: sandflee > > we had a gpu cluster, jobs with bigger resource request couldn't be satisfied > for node is running the jobs with smaller resource request. we didn't open > reserve system because gpu jobs may run days or weeks. we expect scheduler > allocate containers to fill the node , then there will be resource to run > jobs with big resource request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253028#comment-15253028 ] Hitesh Shah commented on YARN-4844: --- getMemoryLong(), etc just seems messy. I can understand why this is needed on branch-2 if we need to support long but for trunk, it seems better to change getMemory() to return a long. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4862) Handle duplicate completed containers in RMNodeImpl
[ https://issues.apache.org/jira/browse/YARN-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253020#comment-15253020 ] Jian He commented on YARN-4862: --- I think the RMNodeImpl#completedContainers could be leaked. e.g. If the container is completed at RM side (preempted etc.), 1. container may be first added into RMNodeImpl#containersToBeRemovedFromNM. 2. At this point completedContainers does not have the container, so {{completedContainers.removeAll(this.containersToBeRemovedFromNM);}} does nothing, 3. later on, container finished at NM and gets added into the completedContainers, and this container will remain there forever. > Handle duplicate completed containers in RMNodeImpl > --- > > Key: YARN-4862 > URL: https://issues.apache.org/jira/browse/YARN-4862 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Rohith Sharma K S >Assignee: Rohith Sharma K S > Attachments: 0001-YARN-4862.patch, 0002-YARN-4862.patch > > > As per > [comment|https://issues.apache.org/jira/browse/YARN-4852?focusedCommentId=15209689&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15209689] > from [~sharadag], there should be safe guard for duplicated container status > in RMNodeImpl before creating UpdatedContainerInfo. > Or else in heavily loaded cluster where event processing is gradually slow, > if any duplicated container are sent to RM(may be bug in NM also), there is > significant impact that RMNodImpl always create UpdatedContainerInfo for > duplicated containers. This result in increase in the heap memory and causes > problem like YARN-4852. > This is an optimization for issue kind YARN-4852 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253011#comment-15253011 ] Hadoop QA commented on YARN-4556: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 23s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 8s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 43s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 165m 22s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.webapp.TestRMWebappAuthentication | | | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA | | | hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokenAuthentication | | | hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart | | | hadoop.yarn.server.resourcemanager.Tes
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253005#comment-15253005 ] Hadoop QA commented on YARN-4844: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-4844 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800125/YARN-4844.1.patch | | JIRA Issue | YARN-4844 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11170/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253004#comment-15253004 ] Wangda Tan commented on YARN-4844: -- An additional note about why it is a 2.8 blocker: Currently Capacity Scheduler relies on total pending resource: When trying to assign container for each node heartbeat from root queue, it skips queue which has <= 0 pending resources. So if memory pendingResource overflows, no more containers can be allocated. branch-2.7 will not affected since the new logic is only in branch-2.8. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters
[ https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4971: --- Affects Version/s: (was: 3.0.0) 2.7.2 > RM fails to re-bind to wildcard IP after failover in multi homed clusters > - > > Key: YARN-4971 > URL: https://issues.apache.org/jira/browse/YARN-4971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg > Attachments: YARN-4971.1.patch > > > If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first > time the service becomes active binding to the wildcard works as expected. If > the service has transitioned from active to standby and then becomes active > again after failovers the service only binds to one of the ip addresses. > There is a difference between the services inside the RM: it only seem to > happen for the services listening on ports: 8030 and 8032 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters
[ https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-4971: --- Target Version/s: 2.9.0 > RM fails to re-bind to wildcard IP after failover in multi homed clusters > - > > Key: YARN-4971 > URL: https://issues.apache.org/jira/browse/YARN-4971 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.2 >Reporter: Wilfred Spiegelenburg >Assignee: Wilfred Spiegelenburg > Attachments: YARN-4971.1.patch > > > If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first > time the service becomes active binding to the wildcard works as expected. If > the service has transitioned from active to standby and then becomes active > again after failovers the service only binds to one of the ip addresses. > There is a difference between the services inside the RM: it only seem to > happen for the services listening on ports: 8030 and 8032 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1297) Miscellaneous Fair Scheduler speedups
[ https://issues.apache.org/jira/browse/YARN-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252984#comment-15252984 ] Karthik Kambatla commented on YARN-1297: Looks like the test failures are because of preemption and runnability logic and test's reliance on the timing of the queue-usage update. I remember Sandy measured the logging changes themselves led to good improvement. What do you think of doing only the logging changes here and drive the resources change as part of YARN-4090? > Miscellaneous Fair Scheduler speedups > - > > Key: YARN-1297 > URL: https://issues.apache.org/jira/browse/YARN-1297 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Reporter: Sandy Ryza >Assignee: Yufei Gu > Attachments: YARN-1297-1.patch, YARN-1297-2.patch, > YARN-1297.005.patch, YARN-1297.3.patch, YARN-1297.4.patch, YARN-1297.4.patch, > YARN-1297.patch, YARN-1297.patch > > > I ran the Fair Scheduler's core scheduling loop through a profiler tool and > identified a bunch of minimally invasive changes that can shave off a few > milliseconds. > The main one is demoting a couple INFO log messages to DEBUG, which brought > my benchmark down from 16000 ms to 6000. > A few others (which had way less of an impact) were > * Most of the time in comparisons was being spent in Math.signum. I switched > this to direct ifs and elses and it halved the percent of time spent in > comparisons. > * I removed some unnecessary instantiations of Resource objects > * I made it so that queues' usage wasn't calculated from the applications up > each time getResourceUsage was called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252976#comment-15252976 ] Wangda Tan commented on YARN-4844: -- Discussed with [~vinodkv], [~jianhe], [~hitesh] about this issue. A good news is, Google PB has backward/forward compatibility for all int_ fields, see: https://developers.google.com/protocol-buffers/docs/proto#updating: bq. int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn't fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits). So we have no problem to change ResourceProto from int32 to int64. In addition to .proto change, following changes are required for API record : Resource. - Update {{set_(int ...)}} to {{set_(long ...)}}, there's no compatible issue for setters - Add {{getMemoryLong}} and {{getVirtualCoresLong}} method And also, we need update Metrics objects related to Resources, such as QueueMetrics, etc. AFAIK, there's no compatibility issue. The last part is scheduler and test fixes. Attached ver.1 patch for review. > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4844: - Attachment: YARN-4844.1.patch > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > Attachments: YARN-4844.1.patch > > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4844: - Target Version/s: 2.8.0 > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4844: - Priority: Blocker (was: Critical) > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Blocker > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
[ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-4844: Assignee: Wangda Tan > Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64 > -- > > Key: YARN-4844 > URL: https://issues.apache.org/jira/browse/YARN-4844 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api >Reporter: Wangda Tan >Assignee: Wangda Tan >Priority: Critical > > We use int32 for memory now, if a cluster has 10k nodes, each node has 210G > memory, we will get a negative total cluster memory. > And another case that easier overflows int32 is: we added all pending > resources of running apps to cluster's total pending resources. If a > problematic app requires too much resources (let's say 1M+ containers, each > of them has 3G containers), int32 will be not enough. > Even if we can cap each app's pending request, we cannot handle the case that > there're many running apps, each of them has capped but still significant > numbers of pending resources. > So we may possibly need to upgrade int32 memory field (could include v-cores > as well) to int64 to avoid integer overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3931) default-node-label-expression doesn’t apply when an application is submitted by RM rest api
[ https://issues.apache.org/jira/browse/YARN-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252916#comment-15252916 ] Hadoop QA commented on YARN-3931: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} YARN-3931 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12745729/YARN-3931.001.patch | | JIRA Issue | YARN-3931 | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/11167/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > default-node-label-expression doesn’t apply when an application is submitted > by RM rest api > --- > > Key: YARN-3931 > URL: https://issues.apache.org/jira/browse/YARN-3931 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: hadoop-2.6.0 >Reporter: kyungwan nam >Assignee: kyungwan nam > Labels: patch > Attachments: YARN-3931.001.patch > > > * > yarn.scheduler.capacity..default-node-label-expression=large_disk > * submit an application using rest api without "app-node-label-expression”, > "am-container-node-label-expression” > * RM doesn’t allocate containers to the hosts associated with large_disk node > label -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252880#comment-15252880 ] Sangjin Lee commented on YARN-3816: --- Understood. I commented because I thought that entities that should skip this aggregation should be pretty generic and couldn't think of why it wouldn't be generic. I'm +1 on the latest patch. I'll wait for a little while so others can also look at it and chime in on the patch. Thanks! > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, > YARN-3816-YARN-2928-v9.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252845#comment-15252845 ] Hudson commented on YARN-4556: -- FAILURE: Integrated in Hadoop-trunk-Commit #9649 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9649/]) YARN-4556. TestFifoScheduler.testResourceOverCommit fails. Contributed (epayne: rev 3dce486d88895dcdf443f4d0064d1fb6e9116045) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java > TestFifoScheduler.testResourceOverCommit fails > --- > > Key: YARN-4556 > URL: https://issues.apache.org/jira/browse/YARN-4556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Akihiro Suda > Attachments: YARN-4556-1.patch > > > From YARN-4548 Jenkins log: > https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler) > Time elapsed: 4.746 sec <<< FAILURE! > java.lang.AssertionError: expected:<-2048> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142) > {code} > https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142 > It seems that Jenkins has been hitting this intermittently since April 2015 > https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4676) Automatic and Asynchronous Decommissioning Nodes Status Tracking
[ https://issues.apache.org/jira/browse/YARN-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Zhi updated YARN-4676: - Attachment: YARN-4676.013.patch Review comments update: 1. Added GracefulDecommision link in hadoop-project/src/site/site.xml; 2. Updated GracefulDecommision.md to be more end-user oriented; 3. Added a newline to TestDecommissioningNodesWatcher.java as suggested. (This however lead to "warning: 1 line adds whitespace errors" during "git apply --verbose YARN-4676.013.patch"). > Automatic and Asynchronous Decommissioning Nodes Status Tracking > > > Key: YARN-4676 > URL: https://issues.apache.org/jira/browse/YARN-4676 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Daniel Zhi >Assignee: Daniel Zhi > Labels: features > Attachments: GracefulDecommissionYarnNode.pdf, YARN-4676.004.patch, > YARN-4676.005.patch, YARN-4676.006.patch, YARN-4676.007.patch, > YARN-4676.008.patch, YARN-4676.009.patch, YARN-4676.010.patch, > YARN-4676.011.patch, YARN-4676.012.patch, YARN-4676.013.patch > > > DecommissioningNodeWatcher inside ResourceTrackingService tracks > DECOMMISSIONING nodes status automatically and asynchronously after > client/admin made the graceful decommission request. It tracks > DECOMMISSIONING nodes status to decide when, after all running containers on > the node have completed, will be transitioned into DECOMMISSIONED state. > NodesListManager detect and handle include and exclude list changes to kick > out decommission or recommission as necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252803#comment-15252803 ] Eric Payne commented on YARN-4390: -- Thanks, [~leftnoteasy], for the detailed work on this issue. Overall, I think the approach looks good. One thing I still wonder about is even if the preemption policy kills the perfectly sized container, will the scheduler know that it needs to assign those freed resources to the same app that requested them? I think this JIRA gets us closer to that goal, but there may be a possibility for the killed container to go someplace else. Is that right? Even if that's the case, I still like this approach. Here are a few comments about the patch. * I would rename "select_candidates_for_reserved_containers" to "select_based_on_reserved_containers" * {{CapacitySchedulerPreemptionUtils#deductPreemptableResourcesBasedSelectedCandidates}} ** {{res}} is never used. Was it intended to be passed to {{tq.deductActuallyToBePreempted}}, or is it needed only to test if {{c.getReservedResource()}} and {{c.getAllocatedResource}} are not null? * Here's just a very minor thing: {{FifoCandidatesSelector#selectCandidates}}: * {{... could already select containers ...}} could be changed to {{... could have already selected containers ...}}, for clarity. * {{ReservedContainerCandidateSelector#getPreemptionCandidatesOnNode}}: {code} Map killableContainers = node.getKillableContainers(); {code} ** Even though killableContainers is an {{unmodifiableMap}}, I think it can still change, can't it? It looks like {{killableContainers}} is only used once in this method. Would it make more sense to wait until {{killableContainers}} is ready to use before calling {{node.getKillableContainers()}}? * {{ReservedContainerCandidateSelector#getNodesForPreemption}}: ** I am a little concerned about calling {{preemptionContext.getScheduler().getAllNodes())}} to get the list of all of the nodes on every iteration of the preemption monitor. I can't think of a better way to handle this, but I think this could be expensive on the RM since {{getAllNodes()}} will lock the {{ClusterNodeTracker}} while it creates the list of nodes, and anything trying to use {{ClusterNodeTrackers}} resources will have to wait. It may not be a problem, but I know that we sometimes see the RM getting bogged down, and I am concerned about adding another long wait every 15 seconds (default) (or however long the preemption monitor interval is configured for). The nodes list doesn't change all that often. I wonder if it would make sense to cache it and only update it periodically (every {{n-th}} iteration?). > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Wangda Tan > Attachments: YARN-4390-design.1.pdf, YARN-4390-test-results.pdf, > YARN-4390.1.patch, YARN-4390.2.patch, YARN-4390.3.branch-2.patch, > YARN-4390.3.patch > > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4981) limit the max containers to assign to one application per heartbeart
[ https://issues.apache.org/jira/browse/YARN-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252787#comment-15252787 ] Karthik Kambatla commented on YARN-4981: Does limiting to one application per heartbeat solve this problem? What happens if the scheduler assigns multiple CPU-intensive tasks (from different applications) on the same node? If the job is CPU-intensive, may be it should ask for more vcores so the CPU is not oversubscribed? Also, [~ywskycn] and I were looking into improving this in the past. We were considering determining maxAssign dynamically based on the delta between the node's allocation and the overall average allocation per node in the cluster. What do you think of that? > limit the max containers to assign to one application per heartbeart > > > Key: YARN-4981 > URL: https://issues.apache.org/jira/browse/YARN-4981 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.5.0, 2.6.4 >Reporter: ChenFolin > Attachments: YARN-4981.patch > > > When use assignMultiple,and such as set maxAssign=10: > if a job is high CPU util or high CPU load, it may lead to a part of nodes > very busy. and the job may cost a long time. > so , i want to limit the max containers to assign to one application per > heartbeart, it may help to nodes use more uniform. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
[ https://issues.apache.org/jira/browse/YARN-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4983: Attachment: YARN-4983-trunk.000.patch This is a patch to quickly fix the problem. I'm reattaching the JvmMetrics after the rm got transitioned to standby state. UgiMetrics is a different story since it fails when get reattached. As a quick fix I recreate it. Right now this patch is just demo for a quick fix. I think the root contradiction here is our metric system does not quite consider the case when a metric system got restarted (and metrics got reattached), while the RM HA code relaunches the metric system. If there are more elegant ways to fix this please do let me know. Thanks! > JVM and UGI metrics disappear after RM is once transitioned to standby mode > --- > > Key: YARN-4983 > URL: https://issues.apache.org/jira/browse/YARN-4983 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Li Lu >Assignee: Li Lu > Attachments: YARN-4983-trunk.000.patch > > > When get transitioned to standby, the RM will shutdown the existing metric > system and relaunch a new one. This will cause the jvm metrics and ugi > metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3931) default-node-label-expression doesn’t apply when an application is submitted by RM rest api
[ https://issues.apache.org/jira/browse/YARN-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252745#comment-15252745 ] Raul Gutierrez Segales commented on YARN-3931: -- [~wangda], [~kyungwan nam] - ping, can we apply this? > default-node-label-expression doesn’t apply when an application is submitted > by RM rest api > --- > > Key: YARN-3931 > URL: https://issues.apache.org/jira/browse/YARN-3931 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 > Environment: hadoop-2.6.0 >Reporter: kyungwan nam >Assignee: kyungwan nam > Labels: patch > Attachments: YARN-3931.001.patch > > > * > yarn.scheduler.capacity..default-node-label-expression=large_disk > * submit an application using rest api without "app-node-label-expression”, > "am-container-node-label-expression” > * RM doesn’t allocate containers to the hosts associated with large_disk node > label -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252732#comment-15252732 ] Eric Payne commented on YARN-4556: -- +1 > TestFifoScheduler.testResourceOverCommit fails > --- > > Key: YARN-4556 > URL: https://issues.apache.org/jira/browse/YARN-4556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Akihiro Suda > Attachments: YARN-4556-1.patch > > > From YARN-4548 Jenkins log: > https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler) > Time elapsed: 4.746 sec <<< FAILURE! > java.lang.AssertionError: expected:<-2048> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142) > {code} > https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142 > It seems that Jenkins has been hitting this intermittently since April 2015 > https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252716#comment-15252716 ] Li Lu commented on YARN-3816: - Thanks [~sjlee0]. Some quick note on the design: the goal here is to let each concrete types of timeline collectors to define their own skip types. The challenge was the updateAggregateStatus method is static so that we can provide the aggregateEntities static method for offline aggregations. This limits the ability to customize the "skip set" for each TimelineCollector's subclasses (no method override for static methods). One solution is to use the strategy pattern to let each timeline collector decides its own set of skipped types, but I do not want each instance of timeline collectors to hold one skip set since they should be the same for the same class. Therefore, I'm making the getEntityTypesSkipAggregation method to be an instance method, but for both TimelineCollectors and AppLevelCollectors, they can simply return the class level skip set. The two static sets (entityTypesSkipAggregation) just happen to have the same name in the two classes, but they're not interfering with each other. Not sure if this is clear enough, but any suggestions would be helpful. Thanks! > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, > YARN-3816-YARN-2928-v9.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4968) A couple of AM retry unit tests need to wait SchedulerApplicationAttempt stopped.
[ https://issues.apache.org/jira/browse/YARN-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252713#comment-15252713 ] Hudson commented on YARN-4968: -- FAILURE: Integrated in Hadoop-trunk-Commit #9648 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9648/]) YARN-4968. A couple of AM retry unit tests need to wait (gtcarrera9: rev 7c6339f66ac301406504be28841bc3f3bfebc8ae) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/applicationsmanager/TestAMRestart.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/TestSchedulerUtils.java > A couple of AM retry unit tests need to wait SchedulerApplicationAttempt > stopped. > - > > Key: YARN-4968 > URL: https://issues.apache.org/jira/browse/YARN-4968 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.9.0 > > Attachments: YARN-4968.1.patch > > > Noticed some unit tests, for example: > TestRMRestart#testRMRestartAfterPreemption > TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry > Sometimes failure because retrying app attempt registers before the previous > scheduler-application-attempt completely completed in scheduler. > We need to wait scheduler-application-attempt stop before retrying following > attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4957) Add getNewReservation in ApplicationClientProtocol
[ https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4957: -- Attachment: YARN-4957.v5.patch Thanks [~subru] for your comments. V5 of the patch addresses each of them. > Add getNewReservation in ApplicationClientProtocol > -- > > Key: YARN-4957 > URL: https://issues.apache.org/jira/browse/YARN-4957 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client, resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Sean Po > Labels: api-breaking > Attachments: YARN-4957.v0.patch, YARN-4957.v1.patch, > YARN-4957.v2.patch, YARN-4957.v3.patch, YARN-4957.v4.patch, YARN-4957.v5.patch > > > Currently submitReservation returns a ReservationId if sucessful. This JIRA > propose adding a getNewReservation in ApplicationClientProtocol for the > following reasons: > * Prevent zombie reservations in the face of client and/or network failures > post submitReservation > * Align reservation submission with application submission -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252689#comment-15252689 ] Hadoop QA commented on YARN-4846: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 37s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: patch generated 1 new + 21 unchanged - 0 fixed = 22 total (was 21) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 169m 25s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_95 Timed out junit tests | org.apache
[jira] [Updated] (YARN-4968) A couple of AM retry unit tests need to wait SchedulerApplicationAttempt stopped.
[ https://issues.apache.org/jira/browse/YARN-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Lu updated YARN-4968: Fix Version/s: (was: 3.0.0) > A couple of AM retry unit tests need to wait SchedulerApplicationAttempt > stopped. > - > > Key: YARN-4968 > URL: https://issues.apache.org/jira/browse/YARN-4968 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.9.0 > > Attachments: YARN-4968.1.patch > > > Noticed some unit tests, for example: > TestRMRestart#testRMRestartAfterPreemption > TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry > Sometimes failure because retrying app attempt registers before the previous > scheduler-application-attempt completely completed in scheduler. > We need to wait scheduler-application-attempt stop before retrying following > attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4968) A couple of AM retry unit tests need to wait SchedulerApplicationAttempt stopped.
[ https://issues.apache.org/jira/browse/YARN-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252640#comment-15252640 ] Li Lu commented on YARN-4968: - Oh BTW, thanks [~wangda] for the work! > A couple of AM retry unit tests need to wait SchedulerApplicationAttempt > stopped. > - > > Key: YARN-4968 > URL: https://issues.apache.org/jira/browse/YARN-4968 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 3.0.0, 2.9.0 > > Attachments: YARN-4968.1.patch > > > Noticed some unit tests, for example: > TestRMRestart#testRMRestartAfterPreemption > TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry > Sometimes failure because retrying app attempt registers before the previous > scheduler-application-attempt completely completed in scheduler. > We need to wait scheduler-application-attempt stop before retrying following > attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4968) A couple of AM retry unit tests need to wait SchedulerApplicationAttempt stopped.
[ https://issues.apache.org/jira/browse/YARN-4968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252594#comment-15252594 ] Li Lu commented on YARN-4968: - No further concerns raised. I'll commit this patch shortly. > A couple of AM retry unit tests need to wait SchedulerApplicationAttempt > stopped. > - > > Key: YARN-4968 > URL: https://issues.apache.org/jira/browse/YARN-4968 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4968.1.patch > > > Noticed some unit tests, for example: > TestRMRestart#testRMRestartAfterPreemption > TestAMRestart#testShouldNotCountFailureToMaxAttemptRetry > Sometimes failure because retrying app attempt registers before the previous > scheduler-application-attempt completely completed in scheduler. > We need to wait scheduler-application-attempt stop before retrying following > attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4556) TestFifoScheduler.testResourceOverCommit fails
[ https://issues.apache.org/jira/browse/YARN-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252593#comment-15252593 ] Nathan Roberts commented on YARN-4556: -- Patch seems like a reasonable test improvement. +1 non-binding > TestFifoScheduler.testResourceOverCommit fails > --- > > Key: YARN-4556 > URL: https://issues.apache.org/jira/browse/YARN-4556 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Akihiro Suda > Attachments: YARN-4556-1.patch > > > From YARN-4548 Jenkins log: > https://builds.apache.org/job/PreCommit-YARN-Build/10181/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdk1.8.0_66.txt > {code} > Running > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 31.004 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler > testResourceOverCommit(org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler) > Time elapsed: 4.746 sec <<< FAILURE! > java.lang.AssertionError: expected:<-2048> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler.testResourceOverCommit(TestFifoScheduler.java:1142) > {code} > https://github.com/apache/hadoop/blob/8676a118a12165ae5a8b80a2a4596c133471ebc1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/TestFifoScheduler.java#L1142 > It seems that Jenkins has been hitting this intermittently since April 2015 > https://www.google.com/search?q=TestFifoScheduler.testResourceOverCommit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4957) Add getNewReservation in ApplicationClientProtocol
[ https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252540#comment-15252540 ] Subru Krishnan commented on YARN-4957: -- Thanks [~seanpo03] for addressing my comments. The latest patch LGTM, +1 from my side. Just a couple of minor comments on the documentation: * Move the part about getting reservationId to either a pre-step or step 0 as now text is aligned with the diagram. * The link to the get new reservationId is broken. [~curino], can you take a look? > Add getNewReservation in ApplicationClientProtocol > -- > > Key: YARN-4957 > URL: https://issues.apache.org/jira/browse/YARN-4957 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client, resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Sean Po > Labels: api-breaking > Attachments: YARN-4957.v0.patch, YARN-4957.v1.patch, > YARN-4957.v2.patch, YARN-4957.v3.patch, YARN-4957.v4.patch > > > Currently submitReservation returns a ReservationId if sucessful. This JIRA > propose adding a getNewReservation in ApplicationClientProtocol for the > following reasons: > * Prevent zombie reservations in the face of client and/or network failures > post submitReservation > * Align reservation submission with application submission -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4954) TestYarnClient.testReservationAPIs fails on machines with less than 4 GB available memory
[ https://issues.apache.org/jira/browse/YARN-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252523#comment-15252523 ] Junping Du commented on YARN-4954: -- Hi [~vinodkv], I saw you move this one into subtask of YARN-4478. Here we noticed timeout issue for 4 test classes: {noformat} org.apache.hadoop.yarn.client.cli.TestYarnCLI org.apache.hadoop.yarn.client.api.impl.TestYarnClient org.apache.hadoop.yarn.client.api.impl.TestAMRMClient org.apache.hadoop.yarn.client.api.impl.TestNMClien {noformat} Shall we file JIRA under YARN-4478 to track them? If so, as a whole or filed separately? > TestYarnClient.testReservationAPIs fails on machines with less than 4 GB > available memory > - > > Key: YARN-4954 > URL: https://issues.apache.org/jira/browse/YARN-4954 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 3.0.0 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4954.001.patch > > > TestYarnClient.testReservationAPIs sometimes fails with this error: > {noformat} > java.lang.AssertionError: > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: > The request cannot be satisfied > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1254) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:457) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:515) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > Caused by: > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: > The request cannot be satisfied > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.IterativePlanner.computeJobAllocation(IterativePlanner.java:151) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:64) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1237) > ... 10 more > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testReservationAPIs(TestYarnClient.java:1227) > {noformat} > This is caused by really not having enough available memory to complete the > reservation (4 * 1024 MB). In my opinion lowering the required memory (either > by lowering the number of containers to 2, or the memory to 512 MB) would > make the test more stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252503#comment-15252503 ] Junping Du commented on YARN-4697: -- After investigating on our cluster which hit the same issue also recently, I think two root cause here: 1. Due to YARN-4325, too many stale applications doesn't get purged from NM state store clean. So NM recover too many stale application which will init app with log aggregation. 2. Because these applications are stale, some operation - like: createAppDir() will failed with token issues, but we swallow exception there and continue to create invalid aggregator - just file YARN-4984 to fix this issue. > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Fix For: 2.9.0 > > Attachments: yarn4697.001.patch, yarn4697.002.patch, > yarn4697.003.patch, yarn4697.004.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4986) Add a check in the coprocessor for table to operated on
Vrushali C created YARN-4986: Summary: Add a check in the coprocessor for table to operated on Key: YARN-4986 URL: https://issues.apache.org/jira/browse/YARN-4986 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C Assignee: Vrushali C As a precautionary measure, it will be a good idea to have the coprocessor code check which table it needs to be working on and return/proceed accordingly. This is more of a safety check so that we are sure we are not inadvertently executing the coprocessor code on some other table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4985) Refactor the coprocessor code & other definition classes into independent packages
Vrushali C created YARN-4985: Summary: Refactor the coprocessor code & other definition classes into independent packages Key: YARN-4985 URL: https://issues.apache.org/jira/browse/YARN-4985 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vrushali C Assignee: Vrushali C As part of the coprocessor deployment, we have realized that it will be much cleaner to have the coprocessor code sit in a package which does not depend on hadoop-yarn-server classes. It only needs hbase and other util classes. These util classes and tag definition related classes can be refactored into their own independent "definition" class package so that making changes to coprocessor code, upgrading hbase, deploying hbase on a different hadoop version cluster etc all becomes operationally much easier and less error prone to having different library jars etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4786) Enhance hbase coprocessor aggregation operations:GLOBAL_MIN, LATEST_MIN etc and FINAL attributes
[ https://issues.apache.org/jira/browse/YARN-4786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vrushali C reassigned YARN-4786: Assignee: Vrushali C > Enhance hbase coprocessor aggregation operations:GLOBAL_MIN, LATEST_MIN etc > and FINAL attributes > > > Key: YARN-4786 > URL: https://issues.apache.org/jira/browse/YARN-4786 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Vrushali C >Assignee: Vrushali C > > As part of YARN-4062, Joep and I had been discussing about min, max > operations and the final attributes. > YARN-4062 has GLOBAL_MIN, GLOBAL_MAX and SUM operations. It presently > indicates SUM_FINAL for a cell that contains a metric that is the final value > for the metric. > We should enhance this such that the set of aggregation dimensions SUM, MIN, > MAX, etc. are really set of a per-column level and shouldn't be passed from > the client, but be instrumented by the ColumnHelper infrastructure instead. > We should probably use a different tag value for that. > Both aggregation dimension and this "FINAL_VALUE" or whatever abbreviation we > use are needed to determine the right thing to do for compaction. Only one > value needs to have this final value bit / tag set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252492#comment-15252492 ] Sangjin Lee commented on YARN-3816: --- We're almost there... It appears that {{entityTypesSkipAggregation}} is in two places: {{TimelineCollector}} and {{AppLevelTimelineCollector}}. And in {{TimelineCollector}} it is not being populated, whereas it is populated in {{AppLevelTimelineCollector}}. This is rather confusing. What I would suggest is to keep it only in {{TimelineCollector}} (I don't think this is dependent on the app-level timeline collector?). Then we could remove the {{getEntityTypesSkipAggregation()}} method and directly reference it at the places where we need it. > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, > YARN-3816-YARN-2928-v9.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252490#comment-15252490 ] Junping Du commented on YARN-4984: -- The exception swallowing happens at LogAggregationService.initAppAggregator() {noformat} // wait until check for existing aggregator to create dirs YarnRuntimeException appDirException = null; try { // Create the app dir createAppDir(user, appId, userUgi); } catch (Exception e) { appLogAggregator.disableLogAggregation(); if (!(e instanceof YarnRuntimeException)) { appDirException = new YarnRuntimeException(e); } else { appDirException = (YarnRuntimeException)e; } } ... // creating aggregator thread {noformat} We should throw out exception in case createAppDir() is created with failure. > LogAggregationService shouldn't swallow exception in handling createAppDir() > which cause thread leak. > - > > Key: YARN-4984 > URL: https://issues.apache.org/jira/browse/YARN-4984 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation >Affects Versions: 2.7.2 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > > Due to YARN-4325, many stale applications still exists in NM state store and > get recovered after NM restart. The app initiation will get failed due to > token invalid, but exception is swallowed and aggregator thread is still > created for invalid app. > Exception is: > {noformat} > 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService > (LogAggregationService.java:run(300)) - Failed to setup application log > directory for application_1448060878692_11842 > 159 > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo > und in cache > 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) > 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) > 162 at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) > 164 at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown > Source) > 166 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 167 at java.lang.reflect.Method.invoke(Method.java:606) > 168 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) > 169 at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) > 171 at > org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) > 172 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) > 173 at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) > 174 at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > 175 at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) > 176 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) > 177 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) > 178 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) > 179 at java.security.AccessController.doPrivileged(Native Method) > 180 at javax.security.auth.Subject.doAs(Subject.java:415) > 181 at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > 182 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) > 183 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) > 184 at > org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) > 185
[jira] [Created] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
Junping Du created YARN-4984: Summary: LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak. Key: YARN-4984 URL: https://issues.apache.org/jira/browse/YARN-4984 Project: Hadoop YARN Issue Type: Bug Components: log-aggregation Affects Versions: 2.7.2 Reporter: Junping Du Assignee: Junping Du Priority: Critical Due to YARN-4325, many stale applications still exists in NM state store and get recovered after NM restart. The app initiation will get failed due to token invalid, but exception is swallowed and aggregator thread is still created for invalid app. Exception is: {noformat} 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService (LogAggregationService.java:run(300)) - Failed to setup application log directory for application_1448060878692_11842 159 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo und in cache 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427) 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358) 162 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source) 164 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source) 166 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 167 at java.lang.reflect.Method.invoke(Method.java:606) 168 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) 169 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source) 171 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) 172 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315) 173 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311) 174 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 175 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311) 176 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248) 177 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67) 178 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276) 179 at java.security.AccessController.doPrivileged(Native Method) 180 at javax.security.auth.Subject.doAs(Subject.java:415) 181 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) 182 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261) 183 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367) 184 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320) 185 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447) 186 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4954) TestYarnClient.testReservationAPIs fails on machines with less than 4 GB available memory
[ https://issues.apache.org/jira/browse/YARN-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-4954: -- Issue Type: Sub-task (was: Test) Parent: YARN-4478 > TestYarnClient.testReservationAPIs fails on machines with less than 4 GB > available memory > - > > Key: YARN-4954 > URL: https://issues.apache.org/jira/browse/YARN-4954 > Project: Hadoop YARN > Issue Type: Sub-task > Components: test >Affects Versions: 3.0.0 >Reporter: Gergely Novák >Assignee: Gergely Novák >Priority: Minor > Attachments: YARN-4954.001.patch > > > TestYarnClient.testReservationAPIs sometimes fails with this error: > {noformat} > java.lang.AssertionError: > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: > The request cannot be satisfied > at > org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:38) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1254) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.submitReservation(ApplicationClientProtocolPBServiceImpl.java:457) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:515) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2422) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2418) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1742) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2416) > Caused by: > org.apache.hadoop.yarn.server.resourcemanager.reservation.exceptions.PlanningException: > The request cannot be satisfied > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.IterativePlanner.computeJobAllocation(IterativePlanner.java:151) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.allocateUser(PlanningAlgorithm.java:64) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.PlanningAlgorithm.createReservation(PlanningAlgorithm.java:140) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.TryManyReservationAgents.createReservation(TryManyReservationAgents.java:55) > at > org.apache.hadoop.yarn.server.resourcemanager.reservation.planning.AlignedPlannerWithGreedy.createReservation(AlignedPlannerWithGreedy.java:84) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.submitReservation(ClientRMService.java:1237) > ... 10 more > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hadoop.yarn.client.api.impl.TestYarnClient.testReservationAPIs(TestYarnClient.java:1227) > {noformat} > This is caused by really not having enough available memory to complete the > reservation (4 * 1024 MB). In my opinion lowering the required memory (either > by lowering the number of containers to 2, or the memory to 512 MB) would > make the test more stable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252464#comment-15252464 ] Hadoop QA commented on YARN-4311: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 52s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 7m 50s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 36s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 12s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 38s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 6m 32s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 6m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 11s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 8s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 7s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 50s {color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252458#comment-15252458 ] Hadoop QA commented on YARN-3816: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 2m 59s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 21s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 20s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 42s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 59s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 4s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s {color} | {color:green} YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} YARN-2928 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 9s {color} | {color:green} YARN-2928 passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 2s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 30s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 0s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 1s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 42s {color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 4m 36s {color} | {color:green} hadoop-yarn-server-timelineservice in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 23s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 15s {col
[jira] [Created] (YARN-4983) JVM and UGI metrics disappear after RM is once transitioned to standby mode
Li Lu created YARN-4983: --- Summary: JVM and UGI metrics disappear after RM is once transitioned to standby mode Key: YARN-4983 URL: https://issues.apache.org/jira/browse/YARN-4983 Project: Hadoop YARN Issue Type: Bug Reporter: Li Lu Assignee: Li Lu When get transitioned to standby, the RM will shutdown the existing metric system and relaunch a new one. This will cause the jvm metrics and ugi metrics to miss in the new metric system. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4976) Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die
[ https://issues.apache.org/jira/browse/YARN-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252394#comment-15252394 ] Giovanni Matteo Fumarola commented on YARN-4976: Thanks [~chris.douglas] and [~templedf] for reviewing and commit it. Thanks [~ellenfkh] for finding the issue. > Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die > -- > > Key: YARN-4976 > URL: https://issues.apache.org/jira/browse/YARN-4976 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Fix For: 2.9.0 > > Attachments: YARN-4976.v0.patch, YARN-4976.v1.patch, > YARN-4976.v2.patch, YARN-4976.v3.patch > > > The client can set a null value for any env variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4150) Failure in TestNMClient because nodereports were not available
[ https://issues.apache.org/jira/browse/YARN-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252380#comment-15252380 ] Ray Chiang commented on YARN-4150: -- I can't replicate the issue. Is this still a problem? > Failure in TestNMClient because nodereports were not available > -- > > Key: YARN-4150 > URL: https://issues.apache.org/jira/browse/YARN-4150 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Anubhav Dhoot >Assignee: Anubhav Dhoot > Attachments: YARN-4150.001.patch > > > Saw a failure in a test run > https://builds.apache.org/job/PreCommit-YARN-Build/9010/testReport/ > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.yarn.client.api.impl.TestNMClient.allocateContainers(TestNMClient.java:244) > at > org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClientNoCleanupOnStop(TestNMClient.java:210) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4976) Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die
[ https://issues.apache.org/jira/browse/YARN-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252354#comment-15252354 ] Hudson commented on YARN-4976: -- FAILURE: Integrated in Hadoop-trunk-Commit #9645 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9645/]) YARN-4976. Missing NullPointer check in ContainerLaunchContextPBImpl. (cdouglas: rev 95a50466075c28110fa7c297e9c5246892076ca8) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ContainerLaunchContextPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/api/records/impl/pb/TestApplicationClientProtocolRecords.java > Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die > -- > > Key: YARN-4976 > URL: https://issues.apache.org/jira/browse/YARN-4976 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Fix For: 2.9.0 > > Attachments: YARN-4976.v0.patch, YARN-4976.v1.patch, > YARN-4976.v2.patch, YARN-4976.v3.patch > > > The client can set a null value for any env variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4846: --- Attachment: 0003-YARN-4846.patch > Random failures for > TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers > > > Key: YARN-4846 > URL: https://issues.apache.org/jira/browse/YARN-4846 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, > 0003-YARN-4846.patch, YARN-4846-update-PCPP.patch > > > {noformat} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4846) Random failures for TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers
[ https://issues.apache.org/jira/browse/YARN-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated YARN-4846: --- Attachment: (was: 0003-YARN-4846.patch) > Random failures for > TestCapacitySchedulerPreemption#testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers > > > Key: YARN-4846 > URL: https://issues.apache.org/jira/browse/YARN-4846 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt > Attachments: 0001-YARN-4846.patch, 0002-YARN-4846.patch, > 0003-YARN-4846.patch, YARN-4846-update-PCPP.patch > > > {noformat} > java.lang.AssertionError: expected:<1> but was:<0> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerPreemption.testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers(TestCapacitySchedulerPreemption.java:473) > {noformat} > https://builds.apache.org/job/PreCommit-YARN-Build/10826/testReport/org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity/TestCapacitySchedulerPreemption/testPreemptionPolicyShouldRespectAlreadyMarkedKillableContainers/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4957) Add getNewReservation in ApplicationClientProtocol
[ https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252304#comment-15252304 ] Sean Po commented on YARN-4957: --- All other test failures are accounted for above. > Add getNewReservation in ApplicationClientProtocol > -- > > Key: YARN-4957 > URL: https://issues.apache.org/jira/browse/YARN-4957 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client, resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Sean Po > Labels: api-breaking > Attachments: YARN-4957.v0.patch, YARN-4957.v1.patch, > YARN-4957.v2.patch, YARN-4957.v3.patch, YARN-4957.v4.patch > > > Currently submitReservation returns a ReservationId if sucessful. This JIRA > propose adding a getNewReservation in ApplicationClientProtocol for the > following reasons: > * Prevent zombie reservations in the face of client and/or network failures > post submitReservation > * Align reservation submission with application submission -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4957) Add getNewReservation in ApplicationClientProtocol
[ https://issues.apache.org/jira/browse/YARN-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252303#comment-15252303 ] Sean Po commented on YARN-4957: --- In YARN-4957.v4.patch I fixed the checkstyle and Javadoc issues. +*Test Failures:*+ Test: [hadoop.mapred.TestMRCJCFileOutputCommitter] Existing JIRA: https://issues.apache.org/jira/browse/MAPREDUCE-6682?jql=text%20~%20%22TestMRCJCFileOutputCommitter%22 This test passes locally. +*Test Failures:*+ Test: [hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore] There are no existing JIRAs for TestZKRMStateStore. This test passes locally. > Add getNewReservation in ApplicationClientProtocol > -- > > Key: YARN-4957 > URL: https://issues.apache.org/jira/browse/YARN-4957 > Project: Hadoop YARN > Issue Type: Sub-task > Components: applications, client, resourcemanager >Affects Versions: 2.8.0 >Reporter: Subru Krishnan >Assignee: Sean Po > Labels: api-breaking > Attachments: YARN-4957.v0.patch, YARN-4957.v1.patch, > YARN-4957.v2.patch, YARN-4957.v3.patch, YARN-4957.v4.patch > > > Currently submitReservation returns a ReservationId if sucessful. This JIRA > propose adding a getNewReservation in ApplicationClientProtocol for the > following reasons: > * Prevent zombie reservations in the face of client and/or network failures > post submitReservation > * Align reservation submission with application submission -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252263#comment-15252263 ] Li Lu commented on YARN-3816: - Thanks [~sjlee0]! Took a look at it. The test failure happened when we read something out from the entity table. The write related to this failure was not performed through timeline collectors IIUC. I'm kicking another Jenkins run. > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, > YARN-3816-YARN-2928-v9.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252202#comment-15252202 ] Sangjin Lee commented on YARN-3816: --- This is somewhat unrelated to the TestHBaseTimelineStorage failure observed above, but did you have a chance to go over our unit tests to see if this patch may change the behavior? I'm thinking about unit tests that write the YARN container entities. Now it may or may not (depending on the timing) write the application row too. I just want to make sure it does not introduce any timing-dependent unit test failures. Have you made a pass on the unit tests to see if we have such a situation? > [Aggregation] App-level aggregation and accumulation for YARN system metrics > > > Key: YARN-3816 > URL: https://issues.apache.org/jira/browse/YARN-3816 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Junping Du >Assignee: Li Lu > Labels: yarn-2928-1st-milestone > Attachments: Application Level Aggregation of Timeline Data.pdf, > YARN-3816-YARN-2928-v1.patch, YARN-3816-YARN-2928-v2.1.patch, > YARN-3816-YARN-2928-v2.2.patch, YARN-3816-YARN-2928-v2.3.patch, > YARN-3816-YARN-2928-v2.patch, YARN-3816-YARN-2928-v3.1.patch, > YARN-3816-YARN-2928-v3.patch, YARN-3816-YARN-2928-v4.patch, > YARN-3816-YARN-2928-v5.patch, YARN-3816-YARN-2928-v6.patch, > YARN-3816-YARN-2928-v7.patch, YARN-3816-YARN-2928-v8.patch, > YARN-3816-YARN-2928-v9.patch, YARN-3816-feature-YARN-2928.v4.1.patch, > YARN-3816-poc-v1.patch, YARN-3816-poc-v2.patch > > > We need application level aggregation of Timeline data: > - To present end user aggregated states for each application, include: > resource (CPU, Memory) consumption across all containers, number of > containers launched/completed/failed, etc. We need this for apps while they > are running as well as when they are done. > - Also, framework specific metrics, e.g. HDFS_BYTES_READ, should be > aggregated to show details of states in framework level. > - Other level (Flow/User/Queue) aggregation can be more efficient to be based > on Application-level aggregations rather than raw entity-level data as much > less raws need to scan (with filter out non-aggregated entities, like: > events, configurations, etc.). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits
[ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252164#comment-15252164 ] Junping Du commented on YARN-4697: -- bq. My concern is that if don't fix the root-cause, though we've protected ourselves from crashes, we'd just be queueing a lot of aggregation processes and causing long waiting times. Agree. We do see NM log aggregation service launch many active threads which keep large number of TCP connections to DN which use out system's file limit. We can fix shared limited thread number here, but the TCP connections problem may not solved by this patch. bq. Upon NM restart, NM will try to recover all applications and submit a log aggregation task to the thread pool for each application recovered. Therefore, a large number of recovered applications plus concurrent applications can cause the thread pool to increase without a bound. Does all these applications are active one or finished already? I suspect we are leaking finished applications in NM state store in recover process. I noticed this issue in filing YARN-4325 but lost my progress as previous long running cluster is gone. [~haibochen], could you check if your case is the same here? In general, I think the fix on this JIRA is OK. But I agree with Vinod that we should dig out more on the root cause or it could be other holes (like TCP connection leaking mentioned above). > NM aggregation thread pool is not bound by limits > - > > Key: YARN-4697 > URL: https://issues.apache.org/jira/browse/YARN-4697 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Critical > Fix For: 2.9.0 > > Attachments: yarn4697.001.patch, yarn4697.002.patch, > yarn4697.003.patch, yarn4697.004.patch > > > In the LogAggregationService.java we create a threadpool to upload logs from > the nodemanager to HDFS if log aggregation is turned on. This is a cached > threadpool which based on the javadoc is an ulimited pool of threads. > In the case that we have had a problem with log aggregation this could cause > a problem on restart. The number of threads created at that point could be > huge and will put a large load on the NameNode and in worse case could even > bring it down due to file descriptor issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4976) Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die
[ https://issues.apache.org/jira/browse/YARN-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252142#comment-15252142 ] Hadoop QA commented on YARN-4976: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 55s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_77. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 11s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 36s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:fbe3e86 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/1270/YARN-4976.v3.patch | | JIRA Issue | YARN-4976 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux ad198258ccb0 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7da5847 | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-
[jira] [Commented] (YARN-4935) TestYarnClient#testSubmitIncorrectQueue fails with FairScheduler
[ https://issues.apache.org/jira/browse/YARN-4935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252141#comment-15252141 ] Yufei Gu commented on YARN-4935: Thanks for the review and commit, [~kasha]! > TestYarnClient#testSubmitIncorrectQueue fails with FairScheduler > > > Key: YARN-4935 > URL: https://issues.apache.org/jira/browse/YARN-4935 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.9.0 > > Attachments: YARN-4935.001.patch, YARN-4935.002.patch > > > This test case introduced by YARN-3131 works well on CapacityScheduler but > not on FairScheduler, since CS doesn't allow dynamically create a queue, > while FS supports it. So if you give a random queue name, CS will reject it, > but FS will create a new queue for it by default. > One simple solution is to specific CS in this test case. /cc [~lichangleo]. I > was thinking about creating another test case for FS. But for the code > introduced by YARN-3131, it may be not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4784) Fairscheduler: defaultQueueSchedulingPolicy should not accept FIFO
[ https://issues.apache.org/jira/browse/YARN-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252136#comment-15252136 ] Yufei Gu commented on YARN-4784: [~kasha], Thanks a lot for review and commit! > Fairscheduler: defaultQueueSchedulingPolicy should not accept FIFO > -- > > Key: YARN-4784 > URL: https://issues.apache.org/jira/browse/YARN-4784 > Project: Hadoop YARN > Issue Type: Improvement > Components: fairscheduler >Affects Versions: 2.7.0 >Reporter: Yufei Gu >Assignee: Yufei Gu > Fix For: 2.9.0 > > Attachments: YARN-4784.001.patch, YARN-4784.002.patch > > > The configure item defaultQueueSchedulingPolicy should not accept fifo as a > value since it is an invalid value for non-leaf queues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4954) TestYarnClient.testReservationAPIs fails on machines with less than 4 GB available memory
[ https://issues.apache.org/jira/browse/YARN-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252115#comment-15252115 ] Hadoop QA commented on YARN-4954: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s {color} | {color:green} trunk passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 14s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 11s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 26s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 147m 7s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy | | | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_77 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_95 Failed junit tests | hadoop.yarn.client.api.impl.TestAMRMProxy | | | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_95 Timed out junit tests | org.apache.hadoop.yarn.client.cli.TestYarnCLI | | | org.apache.hadoop.yarn.client.api.i
[jira] [Updated] (YARN-4976) Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die
[ https://issues.apache.org/jira/browse/YARN-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giovanni Matteo Fumarola updated YARN-4976: --- Attachment: YARN-4976.v3.patch > Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die > -- > > Key: YARN-4976 > URL: https://issues.apache.org/jira/browse/YARN-4976 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-4976.v0.patch, YARN-4976.v1.patch, > YARN-4976.v2.patch, YARN-4976.v3.patch > > > The client can set a null value for any env variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4976) Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die
[ https://issues.apache.org/jira/browse/YARN-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252091#comment-15252091 ] Giovanni Matteo Fumarola commented on YARN-4976: Uploading a new patch without a whitespace in the comment. > Missing NullPointer check in ContainerLaunchContextPBImpl causes RM to die > -- > > Key: YARN-4976 > URL: https://issues.apache.org/jira/browse/YARN-4976 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Reporter: Giovanni Matteo Fumarola >Assignee: Giovanni Matteo Fumarola > Attachments: YARN-4976.v0.patch, YARN-4976.v1.patch, > YARN-4976.v2.patch > > > The client can set a null value for any env variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1458) FairScheduler: Zero weight can lead to livelock
[ https://issues.apache.org/jira/browse/YARN-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252087#comment-15252087 ] zhihai xu commented on YARN-1458: - Ok, no problem, you can try it at your convenience. thanks for finding this issue! > FairScheduler: Zero weight can lead to livelock > --- > > Key: YARN-1458 > URL: https://issues.apache.org/jira/browse/YARN-1458 > Project: Hadoop YARN > Issue Type: Bug > Components: scheduler >Affects Versions: 2.2.0 > Environment: Centos 2.6.18-238.19.1.el5 X86_64 > hadoop2.2.0 >Reporter: qingwu.fu >Assignee: zhihai xu > Fix For: 2.6.0 > > Attachments: YARN-1458.001.patch, YARN-1458.002.patch, > YARN-1458.003.patch, YARN-1458.004.patch, YARN-1458.006.patch, > YARN-1458.addendum.patch, YARN-1458.alternative0.patch, > YARN-1458.alternative1.patch, YARN-1458.alternative2.patch, YARN-1458.patch, > yarn-1458-5.patch, yarn-1458-7.patch, yarn-1458-8.patch > > Original Estimate: 408h > Remaining Estimate: 408h > > The ResourceManager$SchedulerEventDispatcher$EventProcessor blocked when > clients submit lots jobs, it is not easy to reapear. We run the test cluster > for days to reapear it. The output of jstack command on resourcemanager pid: > {code} > "ResourceManager Event Processor" prio=10 tid=0x2aaab0c5f000 nid=0x5dd3 > waiting for monitor entry [0x43aa9000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.removeApplication(FairScheduler.java:671) > - waiting to lock <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:1023) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:112) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440) > at java.lang.Thread.run(Thread.java:744) > …… > "FairSchedulerUpdateThread" daemon prio=10 tid=0x2aaab0a2c800 nid=0x5dc8 > runnable [0x433a2000] >java.lang.Thread.State: RUNNABLE > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.getAppWeight(FairScheduler.java:545) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.getWeights(AppSchedulable.java:129) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShare(ComputeFairShares.java:143) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.resourceUsedWithWeightToResourceRatio(ComputeFairShares.java:131) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.ComputeFairShares.computeShares(ComputeFairShares.java:102) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.policies.FairSharePolicy.computeShares(FairSharePolicy.java:119) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.recomputeShares(FSLeafQueue.java:100) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.recomputeShares(FSParentQueue.java:62) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.update(FairScheduler.java:282) > - locked <0x00070026b6e0> (a > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$UpdateThread.run(FairScheduler.java:255) > at java.lang.Thread.run(Thread.java:744) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4311: -- Attachment: YARN-4311-v16.patch > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-branch-2.7.001.patch, > YARN-4311-branch-2.7.002.patch, YARN-4311-branch-2.7.003.patch, > YARN-4311-branch-2.7.004.patch, YARN-4311-v1.patch, YARN-4311-v10.patch, > YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, > YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, > YARN-4311-v15.patch, YARN-4311-v16.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, > YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4311: -- Attachment: (was: YARN-4311-v16.patch) > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-branch-2.7.001.patch, > YARN-4311-branch-2.7.002.patch, YARN-4311-branch-2.7.003.patch, > YARN-4311-branch-2.7.004.patch, YARN-4311-v1.patch, YARN-4311-v10.patch, > YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, > YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, > YARN-4311-v15.patch, YARN-4311-v16.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, > YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4311: -- Attachment: YARN-4311-v16.patch Fixing checkstyle and findbugs issues. Test failures are locally passing and unrelated. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-branch-2.7.001.patch, > YARN-4311-branch-2.7.002.patch, YARN-4311-branch-2.7.003.patch, > YARN-4311-branch-2.7.004.patch, YARN-4311-v1.patch, YARN-4311-v10.patch, > YARN-4311-v11.patch, YARN-4311-v11.patch, YARN-4311-v12.patch, > YARN-4311-v13.patch, YARN-4311-v13.patch, YARN-4311-v14.patch, > YARN-4311-v15.patch, YARN-4311-v16.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch, YARN-4311-v4.patch, YARN-4311-v5.patch, > YARN-4311-v6.patch, YARN-4311-v7.patch, YARN-4311-v8.patch, YARN-4311-v9.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4890) Unit test intermittent failure: TestNodeLabelContainerAllocation#testQueueUsedCapacitiesUpdate
[ https://issues.apache.org/jira/browse/YARN-4890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251951#comment-15251951 ] Sunil G commented on YARN-4890: --- Thanks [~leftnoteasy] for the review and commit, and thanks [~bibinchundatt] for the review. > Unit test intermittent failure: > TestNodeLabelContainerAllocation#testQueueUsedCapacitiesUpdate > -- > > Key: YARN-4890 > URL: https://issues.apache.org/jira/browse/YARN-4890 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Fix For: 2.9.0 > > Attachments: 0001-YARN-4890.patch > > > Message: > {code} > Tests run: 16, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 314.062 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation > testQueueUsedCapacitiesUpdate(org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation) > Time elapsed: 12.426 sec <<< FAILURE! > java.lang.AssertionError: expected:<0.3> but was:<0.6> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:519) > at org.junit.Assert.assertEquals(Assert.java:609) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.checkQueueUsedCapacity(TestNodeLabelContainerAllocation.java:1163) > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation.testQueueUsedCapacitiesUpdate(TestNodeLabelContainerAllocation.java:1382) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store
[ https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251934#comment-15251934 ] Daniel Templeton commented on YARN-4958: On closer examination, HADOOP-12747 may be solving a slightly different problem. I believe it's solving the problem of reducing the end user pain in specifying a large number of JARs in -libjar. This JIRA is solving the problem of reducing the state store impact of specifying a large number of JARs in -libjar. Based on looking at the implementations, the two JIRAs are related but orthogonal. > The file localization process should allow for wildcards to reduce the > application footprint in the state store > --- > > Key: YARN-4958 > URL: https://issues.apache.org/jira/browse/YARN-4958 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-4958.001.patch > > > When using the -libjars option to add classes to the classpath, every library > so added is explicitly listed in the {{ContainerLaunchContext}}'s local > resources even though they're all uploaded to the same directory in HDFS. > When using tools like Crunch without an uber JAR or when trying to take > advantage of the shared cache, the number of libraries can be quite large. > We've seen many cases where we had to turn down the max number of > applications to prevent ZK from running out of heap because of the size of > the state store entries. > Rather than listing all files independently, this JIRA proposes to have the > NM allow wildcards in the resource localization paths. Specifically, we > propose to allow a path to have a final component (name) set to "*", which is > interpreted by the NM as "download the full directory and link to every file > in it from the job's working directory." This behavior is the same as the > current behavior when using -libjars, but avoids explicitly listing every > file. > This JIRA does not attempt to provide more general purpose wildcards, such as > "\*.jar" or "file\*", as having multiple entries for a single directory > presents numerous logistical issues. > This JIRA also does not attempt to integrate with the shared cache. That > work will be left to a future JIRA. Specifically, this JIRA only applies > when a full directory is uploaded. Currently the shared cache does not > handle directory uploads. > This JIRA proposes to allow for wildcards both in the internal processing of > the -libjars switch and in paths added through the {{Job}} and > {{DistributedCache}} classes. > The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of > all file verification and localization. In the final step, the NM will query > the localized directory to get a list of the files in "dir" such that each > can be linked from the job's working directory. Since $PWD/\* is always > included on the classpath, all JAR files in "dir" will be in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store
[ https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-4958: --- Description: When using the -libjars option to add classes to the classpath, every library so added is explicitly listed in the {{ContainerLaunchContext}}'s local resources even though they're all uploaded to the same directory in HDFS. When using tools like Crunch without an uber JAR or when trying to take advantage of the shared cache, the number of libraries can be quite large. We've seen many cases where we had to turn down the max number of applications to prevent ZK from running out of heap because of the size of the state store entries. Rather than listing all files independently, this JIRA proposes to have the NM allow wildcards in the resource localization paths. Specifically, we propose to allow a path to have a final component (name) set to "*", which is interpreted by the NM as "download the full directory and link to every file in it from the job's working directory." This behavior is the same as the current behavior when using -libjars, but avoids explicitly listing every file. This JIRA does not attempt to provide more general purpose wildcards, such as "\*.jar" or "file\*", as having multiple entries for a single directory presents numerous logistical issues. This JIRA also does not attempt to integrate with the shared cache. That work will be left to a future JIRA. Specifically, this JIRA only applies when a full directory is uploaded. Currently the shared cache does not handle directory uploads. This JIRA proposes to allow for wildcards both in the internal processing of the -libjars switch and in paths added through the {{Job}} and {{DistributedCache}} classes. The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of all file verification and localization. In the final step, the NM will query the localized directory to get a list of the files in "dir" such that each can be linked from the job's working directory. Since $PWD/\* is always included on the classpath, all JAR files in "dir" will be in the classpath. was: When using the -libjars option to add classes to the classpath, every library so added is explicitly listed in the {{ContainerLaunchContext}}'s local resources even though they're all uploaded to the same directory in HDFS. When using tools like Crunch without an uber JAR or when trying to take advantage of the shared cache, the number of libraries can be quite large. We've seen many cases where we had to turn down the max number of applications to prevent ZK from running out of heap because of the size of the state store entries. Rather than listing all files independently, this JIRA proposes to have the NM allow wildcards in the resource localization paths. Specifically, we propose to allow a path to have a final component (name) set to "*", which is interpreted by the NM as "download the full directory and link to every file in it from the job's working directory." This behavior is the same as the current behavior when using -libjars, but avoids explicitly listing every file. This JIRA does not attempt to provide more general purpose wildcards, such as "*.jar" or "file*", as having multiple entries for a single directory presents numerous logistical issues. This JIRA also does not attempt to integrate with the shared cache. That work will be left to a future JIRA. Specifically, this JIRA only applies when a full directory is uploaded. Currently the shared cache does not handle directory uploads. This JIRA proposes to allow for wildcards both in the internal processing of the -libjars switch and in paths added through the {{Job}} and {{DistributedCache}} classes. The proposed approach is to treat a path, "dir/*", as "dir" for purposes of all file verification and localization. In the final step, the NM will query the localized directory to get a list of the files in "dir" such that each can be linked from the job's working directory. Since $PWD/* is always included on the classpath, all JAR files in "dir" will be in the classpath. > The file localization process should allow for wildcards to reduce the > application footprint in the state store > --- > > Key: YARN-4958 > URL: https://issues.apache.org/jira/browse/YARN-4958 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-4958.001.patch > > > When using the -libjars option to add classes to the classpath, every library > so added is explicitly listed in the {{ContainerL
[jira] [Commented] (YARN-4958) The file localization process should allow for wildcards to reduce the application footprint in the state store
[ https://issues.apache.org/jira/browse/YARN-4958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251925#comment-15251925 ] Daniel Templeton commented on YARN-4958: [~jira.shegalov], I just noticed your comments on HADOOP-12747. I'd be curious to have your opinion on this one. > The file localization process should allow for wildcards to reduce the > application footprint in the state store > --- > > Key: YARN-4958 > URL: https://issues.apache.org/jira/browse/YARN-4958 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Daniel Templeton >Priority: Critical > Attachments: YARN-4958.001.patch > > > When using the -libjars option to add classes to the classpath, every library > so added is explicitly listed in the {{ContainerLaunchContext}}'s local > resources even though they're all uploaded to the same directory in HDFS. > When using tools like Crunch without an uber JAR or when trying to take > advantage of the shared cache, the number of libraries can be quite large. > We've seen many cases where we had to turn down the max number of > applications to prevent ZK from running out of heap because of the size of > the state store entries. > Rather than listing all files independently, this JIRA proposes to have the > NM allow wildcards in the resource localization paths. Specifically, we > propose to allow a path to have a final component (name) set to "*", which is > interpreted by the NM as "download the full directory and link to every file > in it from the job's working directory." This behavior is the same as the > current behavior when using -libjars, but avoids explicitly listing every > file. > This JIRA does not attempt to provide more general purpose wildcards, such as > "\*.jar" or "file\*", as having multiple entries for a single directory > presents numerous logistical issues. > This JIRA also does not attempt to integrate with the shared cache. That > work will be left to a future JIRA. Specifically, this JIRA only applies > when a full directory is uploaded. Currently the shared cache does not > handle directory uploads. > This JIRA proposes to allow for wildcards both in the internal processing of > the -libjars switch and in paths added through the {{Job}} and > {{DistributedCache}} classes. > The proposed approach is to treat a path, "dir/\*", as "dir" for purposes of > all file verification and localization. In the final step, the NM will query > the localized directory to get a list of the files in "dir" such that each > can be linked from the job's working directory. Since $PWD/\* is always > included on the classpath, all JAR files in "dir" will be in the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)