[jira] [Commented] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026917#comment-15026917 ] Hadoop QA commented on YARN-4389: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 5s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 31s {color} | {color:red} Patch generated 7 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 160, now 167). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 0s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 48s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 47s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 0, now 1). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 25s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 37s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 2m 29s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 50s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 28s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} unit
[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027096#comment-15027096 ] Tsuyoshi Ozawa commented on YARN-4393: -- +1, checking this in. > TestResourceLocalizationService#testFailedDirsResourceRelease fails > intermittently > -- > > Key: YARN-4393 > URL: https://issues.apache.org/jira/browse/YARN-4393 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4393.01.patch > > > [~ozawa] pointed out this failure on YARN-4380. > Check > https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773 > {noformat} > sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.093 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > eventHandler.handle( > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > Actual invocation has different arguments: > eventHandler.handle( > EventType: APPLICATION_INITED > ); > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026916#comment-15026916 ] Kuhu Shukla commented on YARN-4386: --- Sure that would help. I will update with revised patch soon. Thank you! > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026926#comment-15026926 ] Kuhu Shukla commented on YARN-4386: --- [~djp], without the patch, the decommissioned node is looked up in the list returned by getRMNodes() which will never have any node with nodestate=DECOMMISSIONED; this means that currently a decommissioned node is not even looked at for recommissioning since its part of inactiveNodes list and not the getRMNodes() list. I will continue to think of a test case for this. Appreciate your comments and inputs. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026936#comment-15026936 ] Tsuyoshi Ozawa commented on YARN-4318: -- [~kshukla] please go ahead :-) > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Kuhu Shukla > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa reopened YARN-4393: -- oops, commentted on wrong jira. Reopening. > TestResourceLocalizationService#testFailedDirsResourceRelease fails > intermittently > -- > > Key: YARN-4393 > URL: https://issues.apache.org/jira/browse/YARN-4393 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4393.01.patch > > > [~ozawa] pointed out this failure on YARN-4380. > Check > https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773 > {noformat} > sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.093 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > eventHandler.handle( > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > Actual invocation has different arguments: > eventHandler.handle( > EventType: APPLICATION_INITED > ); > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027045#comment-15027045 ] Hudson commented on YARN-4380: -- FAILURE: Integrated in Hadoop-trunk-Commit #8887 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8887/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027088#comment-15027088 ] Hadoop QA commented on YARN-3226: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 42s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 104, now 107). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 48s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 55s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 154m 1s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774347/0003-YARN-3226.patch | | JIRA
[jira] [Updated] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4380: - Hadoop Flags: Reviewed > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027113#comment-15027113 ] Tsuyoshi Ozawa commented on YARN-4393: -- [~varun_saxena], before committing, I found that there are some missing dispatcher.await(): testResourceRelease: {code} //Send Cleanup Event spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <-- here! verify(mockLocallilzerTracker) .cleanupPrivLocalizers("container_314159265358979_0003_01_42"); req2.remove(LocalResourceVisibility.PRIVATE); spyService.handle(new ContainerLocalizationCleanupEvent(c, req2)); dispatcher.await(); {code} testFailedDirsResourceRelease: {code} // Send Cleanup Event spyService.handle(new ContainerLocalizationCleanupEvent(c, req)); // <- here! verify(mockLocallilzerTracker).cleanupPrivLocalizers( "container_314159265358979_0003_01_42"); {code} testRecovery: {code} assertNotNull("Localization not started", privLr1.getLocalPath()); privTracker1.handle(new ResourceLocalizedEvent(privReq1, privLr1.getLocalPath(), privLr1.getSize() + 5)); assertNotNull("Localization not started", privLr2.getLocalPath()); privTracker1.handle(new ResourceLocalizedEvent(privReq2, privLr2.getLocalPath(), privLr2.getSize() + 10)); assertNotNull("Localization not started", appLr1.getLocalPath()); appTracker1.handle(new ResourceLocalizedEvent(appReq1, appLr1.getLocalPath(), appLr1.getSize())); assertNotNull("Localization not started", appLr3.getLocalPath()); appTracker2.handle(new ResourceLocalizedEvent(appReq3, appLr3.getLocalPath(), appLr3.getSize() + 7)); assertNotNull("Localization not started", pubLr1.getLocalPath()); pubTracker.handle(new ResourceLocalizedEvent(pubReq1, pubLr1.getLocalPath(), pubLr1.getSize() + 1000)); assertNotNull("Localization not started", pubLr2.getLocalPath()); pubTracker.handle(new ResourceLocalizedEvent(pubReq2, pubLr2.getLocalPath(), pubLr2.getSize() + 9)); {code} Could you update them? > TestResourceLocalizationService#testFailedDirsResourceRelease fails > intermittently > -- > > Key: YARN-4393 > URL: https://issues.apache.org/jira/browse/YARN-4393 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.1 >Reporter: Varun Saxena >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4393.01.patch > > > [~ozawa] pointed out this failure on YARN-4380. > Check > https://issues.apache.org/jira/browse/YARN-4380?focusedCommentId=15023773=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15023773 > {noformat} > sts run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.518 sec <<< > FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testFailedDirsResourceRelease(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.093 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > eventHandler.handle( > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > Actual invocation has different arguments: > eventHandler.handle( > EventType: APPLICATION_INITED > ); > -> at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:183) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testFailedDirsResourceRelease(TestResourceLocalizationService.java:2632) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4348) ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of zkSessionTimeout
[ https://issues.apache.org/jira/browse/YARN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027145#comment-15027145 ] Tsuyoshi Ozawa commented on YARN-4348: -- Kicking Jenkins again. > ZKRMStateStore.syncInternal should wait for zkResyncWaitTime instead of > zkSessionTimeout > > > Key: YARN-4348 > URL: https://issues.apache.org/jira/browse/YARN-4348 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.2, 2.6.2 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Blocker > Attachments: YARN-4348-branch-2.7.002.patch, > YARN-4348-branch-2.7.003.patch, YARN-4348.001.patch, YARN-4348.001.patch, > log.txt > > > Jian mentioned that the current internal ZK configuration of ZKRMStateStore > can cause a following situation: > 1. syncInternal timeouts, > 2. but sync succeeded later on. > We should use zkResyncWaitTime as the timeout value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026923#comment-15026923 ] Hadoop QA commented on YARN-4371: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 10s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client (total was 15, now 16). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 39s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 42s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 115m 32s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | | org.apache.hadoop.yarn.client.api.impl.TestNMClient | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.client.TestGetGroups | | JDK v1.7.0_85 Timed out junit tests | org.apache.hadoop.yarn.client.api.impl.TestAMRMClient | | | org.apache.hadoop.yarn.client.api.impl.TestYarnClient | | |
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027016#comment-15027016 ] Sunil G commented on YARN-4386: --- {{getRMNodes}} will have only Active/Decommissioning nodes. hence as you mentioned, its highly unlikely that a node will be getRMNodes list which is also DECOMMISSIONED. For test case, you can try add a node which is DECOMMISSIONED to active node list forcefully. But this seems again not a very valid case. [~djp], will this happen only if a race condition exits in active->decommisioned window. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027198#comment-15027198 ] Sangjin Lee commented on YARN-3769: --- Could you check if the 2.7 commit applies cleanly to branch-2.6? If not, it would be great if you could post a 2.6 patch. Thanks. > Consider user limit when calculating total pending resource for preemption > policy in Capacity Scheduler > --- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 2.7.3 > > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, > YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, > YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027005#comment-15027005 ] Sunil G commented on YARN-4371: --- Test case failures are not related to this patch, and it happened because of hostname problem. we can see one more report whether this same tests {{TestGetGroups}} are failing or not. > "yarn application -kill" should take multiple application ids > - > > Key: YARN-4371 > URL: https://issues.apache.org/jira/browse/YARN-4371 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa >Assignee: Sunil G > Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch > > > Currently we cannot pass multiple applications to "yarn application -kill" > command. The command should take multiple application ids at the same time. > Each entries should be separated with whitespace like: > {code} > yarn application -kill application_1234_0001 application_1234_0007 > application_1234_0012 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-4318: - Assignee: Kuhu Shukla > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Kuhu Shukla > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4292: -- Attachment: 0004-YARN-4292.patch As YARN-3980 is in, making necessary changes and updating a new version of patch. Also attaching REST o/p (xml) {noformat} /default-rack RUNNING localhost:25006 localhost localhost:25008 1448467127146 3.0.0-SNAPSHOT 0 0 8192 0 8 4430 4446 18.49383544921875 0 0 0.0 {noformat} normal o/p {noformat} nodes: { node: [1] 0: { rack: "/default-rack" state: "RUNNING" id: "localhost:25006" nodeHostName: "localhost" nodeHTTPAddress: "localhost:25008" lastHealthUpdate: 1448467007146 version: "3.0.0-SNAPSHOT" healthReport: "" numContainers: 0 usedMemoryMB: 0 availMemoryMB: 8192 usedVirtualCores: 0 availableVirtualCores: 8 resourceUtilization: { nodePhysicalMemoryMB: 4384 nodeVirtualMemoryMB: 4399 nodeCPUUsage: 6.99766731262207 containersPhysicalMemoryMB: 0 containersVirtualMemoryMB: 0 containersCPUUsage: 0 }- } [~leftnoteasy], could you please check. - }- {noformat} > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, > 0003-YARN-4292.patch, 0004-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3769) Consider user limit when calculating total pending resource for preemption policy in Capacity Scheduler
[ https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027011#comment-15027011 ] Eric Payne commented on YARN-3769: -- bq. should this be backported to 2.6.x? [~sjlee0], I would recommend it. We were seeing a lot of unnecessary preempting without this fix. > Consider user limit when calculating total pending resource for preemption > policy in Capacity Scheduler > --- > > Key: YARN-3769 > URL: https://issues.apache.org/jira/browse/YARN-3769 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0, 2.7.0, 2.8.0 >Reporter: Eric Payne >Assignee: Eric Payne > Fix For: 2.7.3 > > Attachments: YARN-3769-branch-2.002.patch, > YARN-3769-branch-2.7.002.patch, YARN-3769-branch-2.7.003.patch, > YARN-3769-branch-2.7.005.patch, YARN-3769-branch-2.7.006.patch, > YARN-3769-branch-2.7.007.patch, YARN-3769.001.branch-2.7.patch, > YARN-3769.001.branch-2.8.patch, YARN-3769.003.patch, YARN-3769.004.patch, > YARN-3769.005.patch > > > We are seeing the preemption monitor preempting containers from queue A and > then seeing the capacity scheduler giving them immediately back to queue A. > This happens quite often and causes a lot of churn. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027020#comment-15027020 ] Eric Payne commented on YARN-4390: -- [~bikassaha], thank you for your comments. {quote} However, if YARN ends up preempting 8x1GB containers on different nodes then the under-allocated AM will not get its resources and may result in further avoidable preemptions. {quote} This is the scenario I was documenting in the description. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027004#comment-15027004 ] Tsuyoshi Ozawa commented on YARN-4380: -- +1, checking this in. > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3849) Too much of preemption activity causing continuos killing of containers across queues
[ https://issues.apache.org/jira/browse/YARN-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027026#comment-15027026 ] Sunil G commented on YARN-3849: --- Yes, I think so. This will be a good addition in 2.6 line, I will try see to back port the same to 2.6. > Too much of preemption activity causing continuos killing of containers > across queues > - > > Key: YARN-3849 > URL: https://issues.apache.org/jira/browse/YARN-3849 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.7.0 >Reporter: Sunil G >Assignee: Sunil G >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: 0001-YARN-3849.patch, 0002-YARN-3849.patch, > 0003-YARN-3849.patch, 0004-YARN-3849-branch2-7.patch, 0004-YARN-3849.patch > > > Two queues are used. Each queue has given a capacity of 0.5. Dominant > Resource policy is used. > 1. An app is submitted in QueueA which is consuming full cluster capacity > 2. After submitting an app in QueueB, there are some demand and invoking > preemption in QueueA > 3. Instead of killing the excess of 0.5 guaranteed capacity, we observed that > all containers other than AM is getting killed in QueueA > 4. Now the app in QueueB is trying to take over cluster with the current free > space. But there are some updated demand from the app in QueueA which lost > its containers earlier, and preemption is kicked in QueueB now. > Scenario in step 3 and 4 continuously happening in loop. Thus none of the > apps are completing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027929#comment-15027929 ] Allen Wittenauer commented on YARN-3862: bq. do you know what's going on? Thanks. Yes. As was announced on common-dev@, yetus is now using the dockerfile that ships with Hadoop. So, for at least the 2nd time I'm aware of, this branch is missing build fixes This time it looks like 0ca8df716a1bb8e7f894914fb0d740a1d14df8e3 . FWIW, this is going to be happening a lot. It'd might make your lives easier to keep track of changes in files that are directly build related... > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027907#comment-15027907 ] Naganarasimha G R commented on YARN-4304: - Thanks [~wangda] for sharing your thoughts, bq. Instead of putting max am to a separated object, I would prefer to put them to existing resourceUsageByPartition instead of introducing a new object. Even though max-am-limit is not usage, but it describes upper bound of usage. Thoughts? By {{resourceUsageByPartition}} you refer to {{ResourceUsageInfo}} present in the parent class {{CapacitySchedulerQueueInfo}}, if so yes and any way as these new classes have not gone into any version yet, so we can rename them as appropriately like {{resourceUsagesByPartition}} => {{resourceInfoByPartition}}, {{ResourceUsageInfo}} => {{ResourceInfo}} & {{PartitionResourceUsageInfo}} => {{PartitionResourceInfo}}. Thoughts ? > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027243#comment-15027243 ] Hudson commented on YARN-4380: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2664 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2664/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027863#comment-15027863 ] Naganarasimha G R commented on YARN-3946: - Hi [~wangda], Test case failures are not related to the jira and locally its passing with patch modifications and for some test case failures jira is also raised already. > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4108) CapacityScheduler: Improve preemption to preempt only those containers that would satisfy the incoming request
[ https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028104#comment-15028104 ] Bikas Saha commented on YARN-4108: -- These problems will be hard to solve without involving the scheduler in the decision cycle. The preemption policy can determine how much to preempt from a queue at a macro level. But the actual containers to preempt would be selected by the scheduler. That is where using the global node picture will help. For a given container request, if we can scan its nodes (if any) and make either an allocation or preemption decision. Else, if we are doing container allocation on node heartbeat, then just like delay scheduling logic, we can mark a node for preemption but not preempt it and associate that node with the container request for which preemption is needed (request.nodeToPreempt). And we can cycle through all nodes like this and change the request->node association when we find better nodes to preempt. After cycling through all nodes, if when we again reach a node such that it matches the request.nodeToPreempt then we can execute the decision of actually preempting the node. If there are no nodes that can satisfy the request (e.g. request wants node A but preemptedQueue has no containers on node A) then scheduler should be able to callback to the preemption module and notify it so that some other queue can be picked to preempt. > CapacityScheduler: Improve preemption to preempt only those containers that > would satisfy the incoming request > -- > > Key: YARN-4108 > URL: https://issues.apache.org/jira/browse/YARN-4108 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Reporter: Wangda Tan >Assignee: Wangda Tan > > This is sibling JIRA for YARN-2154. We should make sure container preemption > is more effective. > *Requirements:*: > 1) Can handle case of user-limit preemption > 2) Can handle case of resource placement requirements, such as: hard-locality > (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I > don't want to use rack1 and host\[1-3\]) > 3) Can handle preemption within a queue: cross user preemption (YARN-2113), > cross applicaiton preemption (such as priority-based (YARN-1963) / > fairness-based (YARN-3319)). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3862: --- Attachment: YARN-3862-feature-YARN-2928.04.patch > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover
[ https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027335#comment-15027335 ] Jonathan Eagles commented on YARN-4392: --- [~xgong], jason and I will be out until monday and will take a look at it then. > ApplicationCreatedEvent event time resets after RM restart/failover > --- > > Key: YARN-4392 > URL: https://issues.apache.org/jira/browse/YARN-4392 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Xuan Gong >Priority: Critical > Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch > > > {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - > Finished time 1437453994768 is ahead of started time 1440308399674 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437454008244 is ahead of started time 1440308399676 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444305171 is ahead of started time 1440308399653 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444293115 is ahead of started time 1440308399647 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444379645 is ahead of started time 1440308399656 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444361234 is ahead of started time 1440308399655 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444342029 is ahead of started time 1440308399654 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444323447 is ahead of started time 1440308399654 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143730006 is ahead of started time 1440308399660 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143715698 is ahead of started time 1440308399659 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143719060 is ahead of started time 1440308399658 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444393931 is ahead of started time 1440308399657 > {code} . > From ATS logs, we would see a large amount of 'stale alerts' messages > periodically -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027350#comment-15027350 ] Hadoop QA commented on YARN-4292: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 8 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 18, now 26). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 31s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 28s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 48s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 10s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 155m 18s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027365#comment-15027365 ] Hadoop QA commented on YARN-3862: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} docker {color} | {color:red} 16m 41s {color} | {color:red} Docker failed to build yetus/hadoop:123b3db. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774383/YARN-3862-feature-YARN-2928.04.patch | | JIRA Issue | YARN-3862 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9799/console | This message was automatically generated. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4293) ResourceUtilization should be a part of yarn node CLI
[ https://issues.apache.org/jira/browse/YARN-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4293: -- Attachment: 0001-YARN-4293.patch Attaching an initial version. - This became little trickier as {{ResourceUtilization}} was written in {{org.apache.hadoop.yarn.server.api.records}} where as we needed it in {{org.apache.hadoop.yarn.api.records}}. Since we need to pull this info to client as node report. So i moved these classes to *yarn.api*, hence all source files which used {{ResourceUtilization}} needed a change in import. - This information is added to "node -status" command only. [~leftnoteasy] could you please help to check this and let me know whether the approach is fine or not. Thank You. > ResourceUtilization should be a part of yarn node CLI > - > > Key: YARN-4293 > URL: https://issues.apache.org/jira/browse/YARN-4293 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Attachments: 0001-YARN-4293.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027275#comment-15027275 ] Hudson commented on YARN-4380: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #733 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/733/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027424#comment-15027424 ] Sangjin Lee commented on YARN-4297: --- +1. Committing shortly. > TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 > branch > --- > > Key: YARN-4297 > URL: https://issues.apache.org/jira/browse/YARN-4297 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4297-YARN-2928.01.patch, > YARN-4297-feature-YARN-2928.02.patch, YARN-4297-feature-YARN-2928.03.patch > > > {noformat} > Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 0.11 sec <<< ERROR! > java.lang.ClassCastException: > org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe > cannot be cast to > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495) > {noformat} > {noformat} > testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.649 sec <<< ERROR! > java.lang.ClassCastException: > org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a > cannot be cast to > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269) > Tests in error: > TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast > org.a... > TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer > TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 > » ClassCast > TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast > org.apache.hado... > TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast > org.apache.hadoo... > TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 > » ClassCast > TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast > org.ap... > TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » > NullPointer > TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast > org.apache.ha... > TestRMContainerAllocator.testResource:390 » ClassCast > org.apache.hadoop.mapred... > TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast > org.apache.hadoop.m... > TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » > ClassCast > TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast > org.apache > TestRMContainerAllocator.testSimple:219 » ClassCast > org.apache.hadoop.mapreduc... > > TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 > » ClassCast > TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast > org.apache.hadoop... > > TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489 > » ClassCast > Tests run: 26, Failures: 0, Errors: 17, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027316#comment-15027316 ] Varun Saxena commented on YARN-4238: [~sjlee0], kindly review. Mapreduce related test failures will be fixed by YARN-4297. Checkstyle issue is due to indentation inside switch. This can only be fixed by changing earlier lines inside switch case. Do you want me to fix that ? RM related test failures are not related. They have been carried over from trunk. Have corresponding JIRAs' for them. asf license warnings will be fixed once we merge MAPREDUCE-6557 from trunk. hag The 2 javac issues are not related to code change either > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4395) Typo in comment in ClientServiceDelegate
Daniel Templeton created YARN-4395: -- Summary: Typo in comment in ClientServiceDelegate Key: YARN-4395 URL: https://issues.apache.org/jira/browse/YARN-4395 Project: Hadoop YARN Issue Type: Task Reporter: Daniel Templeton Assignee: Daniel Templeton Priority: Trivial Line 337 in {{invoke()}} has the following comment: {code} // if it's AM shut down, do not decrement maxClientRetry as we wait for // AM to be restarted. {code} Ideally it should be: {code} // If its AM shut down, do not decrement maxClientRetry while we wait // for its AM to be restarted. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027384#comment-15027384 ] Junping Du commented on YARN-3226: -- The patch LGTM in overall. Some NITs: {code} +default : + LOG.debug("Unexpcted inital state"); ... +default : + LOG.debug("Unexpcted final state"); {code} We should have warn as log level because this is unexpected. Also, a typo here: "inital" => "initial". {code} case DECOMMISSIONED: -metrics.incrDecommisionedNMs(); +metrics.incrDecommisionedNMs(); break; {code} May be indentation problems. Also, I need someone to review UI changes. [~xgong], can you take a look at it? Thanks! > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4297) TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 branch
[ https://issues.apache.org/jira/browse/YARN-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027296#comment-15027296 ] Varun Saxena commented on YARN-4297: [~sjlee0], kindly review. > TestJobHistoryEventHandler and TestRMContainerAllocator failing on YARN-2928 > branch > --- > > Key: YARN-4297 > URL: https://issues.apache.org/jira/browse/YARN-4297 > Project: Hadoop YARN > Issue Type: Sub-task >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Attachments: YARN-4297-YARN-2928.01.patch, > YARN-4297-feature-YARN-2928.02.patch, YARN-4297-feature-YARN-2928.03.patch > > > {noformat} > Tests run: 13, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 16.09 sec > <<< FAILURE! - in > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler > testTimelineEventHandling(org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler) > Time elapsed: 0.11 sec <<< ERROR! > java.lang.ClassCastException: > org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$95d3ddbe > cannot be cast to > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceInit(JobHistoryEventHandler.java:271) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler.testTimelineEventHandling(TestJobHistoryEventHandler.java:495) > {noformat} > {noformat} > testRMContainerAllocatorResendsRequestsOnRMRestart(org.apache.hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator) > Time elapsed: 2.649 sec <<< ERROR! > java.lang.ClassCastException: > org.apache.hadoop.mapreduce.v2.app.AppContext$$EnhancerByMockitoWithCGLIB$$8e08559a > cannot be cast to > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$RunningAppContext > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getResources(RMContainerAllocator.java:802) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.heartbeat(RMContainerAllocator.java:269) > Tests in error: > TestRMContainerAllocator.testExcessReduceContainerAssign:669 » ClassCast > org.a... > TestRMContainerAllocator.testReportedAppProgress:970 » NullPointer > TestRMContainerAllocator.testBlackListedNodesWithSchedulingToThatNode:1578 > » ClassCast > TestRMContainerAllocator.testBlackListedNodes:1292 » ClassCast > org.apache.hado... > TestRMContainerAllocator.testAMRMTokenUpdate:2691 » ClassCast > org.apache.hadoo... > TestRMContainerAllocator.testMapReduceAllocationWithNodeLabelExpression:722 > » ClassCast > TestRMContainerAllocator.testReducerRampdownDiagnostics:443 » ClassCast > org.ap... > TestRMContainerAllocator.testReportedAppProgressWithOnlyMaps:1118 » > NullPointer > TestRMContainerAllocator.testMapReduceScheduling:819 » ClassCast > org.apache.ha... > TestRMContainerAllocator.testResource:390 » ClassCast > org.apache.hadoop.mapred... > TestRMContainerAllocator.testUpdatedNodes:1190 » ClassCast > org.apache.hadoop.m... > TestRMContainerAllocator.testCompletedTasksRecalculateSchedule:2249 » > ClassCast > TestRMContainerAllocator.testConcurrentTaskLimits:2779 » ClassCast > org.apache > TestRMContainerAllocator.testSimple:219 » ClassCast > org.apache.hadoop.mapreduc... > > TestRMContainerAllocator.testIgnoreBlacklisting:1378->getContainerOnHost:1511 > » ClassCast > TestRMContainerAllocator.testMapNodeLocality:310 » ClassCast > org.apache.hadoop... > > TestRMContainerAllocator.testRMContainerAllocatorResendsRequestsOnRMRestart:2489 > » ClassCast > Tests run: 26, Failures: 0, Errors: 17, Skipped: 0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027348#comment-15027348 ] Sangjin Lee commented on YARN-3862: --- Yes, I think that would be better. Thanks! > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4394) ClientServiceDelegate doesn't handle retries during AM restart as intended
Daniel Templeton created YARN-4394: -- Summary: ClientServiceDelegate doesn't handle retries during AM restart as intended Key: YARN-4394 URL: https://issues.apache.org/jira/browse/YARN-4394 Project: Hadoop YARN Issue Type: Bug Reporter: Daniel Templeton Assignee: Daniel Templeton In the {{invoke()}} method, I found the following code: {code} private AtomicBoolean usingAMProxy = new AtomicBoolean(false); ... // if it's AM shut down, do not decrement maxClientRetry as we wait for // AM to be restarted. if (!usingAMProxy.get()) { maxClientRetry--; } usingAMProxy.set(false); {code} When we create the AM proxy, we set the flag to true. If we fail to connect, the impact of the flag being true is that the code will try one extra time, giving it 400ms instead of just 300ms. I can't imagine that's the intended behavior. After any failure, the flag will forever more be false, but fortunately (?!?) the flag is otherwise unused. Looks like I need to do some archeology to figure out how we ended up here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4390) Consider container request size during CS preemption
[ https://issues.apache.org/jira/browse/YARN-4390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027380#comment-15027380 ] Carlo Curino commented on YARN-4390: [~bikassaha] if the containers to be preempted (remember we respect the priority in the queue, and priority of containers) are in one box, the 8GB freed should be bundled and given to the AM, however the preemptionpolicy does not at the moment try to free resources to satisfy an exact request. This is an important philosophical point: I am quite convinced that preemption should be used to fix "large imbalances" in fairness/capacity between queues/users (hence the dead-zone in which we do not trigger preemption even if we are off balance), and not to micro-manage allocations. Keep in mind that preemption will take a while to kick in (by design), as it allows the application to respond to a preemption signal etc. As such in many cases the 8GB container request will be already otherwise satisfied before this preemption kicks in. The current implementation follows this philosophy and only looks at the overall demand of resources, not at exactly which pending requests exists. I think this is correct and sufficient for large clusters with batch mostly workloads (like the one we were focusing on when we started preemption a few years back) since cluster conditions mutate too quickly for us to try to chase and micromanage allocations with preemption... In very small clusters, or in cluster running several long-running services, things might be different, as we can have potentially small, but very persistent imbalances which we might want to address with more surgical preemption actions. my 2 cents.. > Consider container request size during CS preemption > > > Key: YARN-4390 > URL: https://issues.apache.org/jira/browse/YARN-4390 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler >Affects Versions: 3.0.0, 2.8.0, 2.7.3 >Reporter: Eric Payne >Assignee: Eric Payne > > There are multiple reasons why preemption could unnecessarily preempt > containers. One is that an app could be requesting a large container (say > 8-GB), and the preemption monitor could conceivably preempt multiple > containers (say 8, 1-GB containers) in order to fill the large container > request. These smaller containers would then be rejected by the requesting AM > and potentially given right back to the preempted app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027332#comment-15027332 ] Kuhu Shukla commented on YARN-4318: --- Although locally the tests pass, I see that the failure is coming from server not being able resolve the server hostname/address. I checked a few more pre-commits with this issue and somehow instead of 'localhost/127.0.0.1' the hostnames are 48 bit hex values . {code} public Connection(ConnectionId remoteId, int serviceClass) throws IOException { this.remoteId = remoteId; this.server = remoteId.getAddress(); if (server.isUnresolved()) { throw NetUtils.wrapException(server.getHostName(), server.getPort(), null, 0, new UnknownHostException()); } {code} {code} testUnauthorizedAccess[1](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) Time elapsed: 2.784 sec <<< ERROR! java.net.UnknownHostException: Invalid host name: local host is: (unknown); destination host is: "52b8ea35fca2":8030; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost at org.apache.hadoop.ipc.Client$Connection.(Client.java:413) {code} These are not constant with respect to the machines that ran the precommit, that is, H5 host had two such runs with different 48 bit hex values. The {{serviceaddr}} is using default config value, which means instead of an IP, the 0.0.0.0 in the default-config is picking a hex value from the environment of the machine/VM. Could this be related to our latest Docker/Yetus migration? Asking [~ste...@apache.org] if he has any inputs on this. Appreciate it. > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Kuhu Shukla > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027378#comment-15027378 ] Varun Saxena commented on YARN-3862: I am not sure why the build failed. Can it be submitted again ? > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027544#comment-15027544 ] Steve Loughran commented on YARN-4318: -- where's this happening? If jenkins, put jenkins in the env (and version too, please, + component==test). > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa >Assignee: Kuhu Shukla > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027668#comment-15027668 ] Hudson commented on YARN-4380: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2580 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2580/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java * hadoop-yarn-project/CHANGES.txt > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027565#comment-15027565 ] Kuhu Shukla commented on YARN-4318: --- Thank you Steve. I have updated the fields. Yes, it is seen on jenkins and not locally. > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: jenkins >Reporter: Tsuyoshi Ozawa >Assignee: Kuhu Shukla > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027596#comment-15027596 ] Sangjin Lee commented on YARN-3862: --- Failed for the same reason: {noformat} Step 16 : ADD hadoop_env_checks.sh /root/hadoop_env_checks.sh hadoop_env_checks.sh: no such file or directory Total Elapsed time: 10m 4s ERROR: Docker failed to build image. {noformat} [~aw], do you know what's going on? Thanks. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027640#comment-15027640 ] Hudson commented on YARN-4380: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/723/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027590#comment-15027590 ] Hadoop QA commented on YARN-3862: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} docker {color} | {color:red} 10m 4s {color} | {color:red} Docker failed to build yetus/hadoop:123b3db. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774383/YARN-3862-feature-YARN-2928.04.patch | | JIRA Issue | YARN-3862 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9801/console | This message was automatically generated. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.04.patch, > YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027623#comment-15027623 ] Hudson commented on YARN-4380: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #643 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/643/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4184) Remove update reservation state api from state store as its not used by ReservationSystem
[ https://issues.apache.org/jira/browse/YARN-4184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated YARN-4184: - Parent Issue: YARN-2573 (was: YARN-2572) > Remove update reservation state api from state store as its not used by > ReservationSystem > - > > Key: YARN-4184 > URL: https://issues.apache.org/jira/browse/YARN-4184 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Anubhav Dhoot >Assignee: Sean Po > Fix For: 2.8.0 > > Attachments: YARN-4184.v1.patch > > > ReservationSystem uses remove/add for updates and thus update api in state > store is not needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4318: -- Affects Version/s: 3.0.0 Environment: jenkins Component/s: test > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 > Environment: jenkins >Reporter: Tsuyoshi Ozawa >Assignee: Kuhu Shukla > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4380) TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027339#comment-15027339 ] Hudson commented on YARN-4380: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1454 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1454/]) YARN-4380. (ozawa: rev 0656d2dc83af6a48a8d8d0e37cdf1f813124f366) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/TestResourceLocalizationService.java > TestResourceLocalizationService.testDownloadingResourcesOnContainerKill fails > intermittently > > > Key: YARN-4380 > URL: https://issues.apache.org/jira/browse/YARN-4380 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Affects Versions: 2.8.0, 2.7.1 >Reporter: Tsuyoshi Ozawa >Assignee: Varun Saxena > Fix For: 2.7.3 > > Attachments: YARN-4380.01.patch, > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.2.txt, > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService-output.txt > > > {quote} > Tests run: 14, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 5.361 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService > testDownloadingResourcesOnContainerKill(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService) > Time elapsed: 0.109 sec <<< FAILURE! > org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: > Argument(s) are different! Wanted: > deletionService.delete( > "user0", > null, > > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > Actual invocation has different arguments: > deletionService.delete( > "user0", > > /home/ubuntu/hadoop-dev/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService/0/usercache/user0/appcache/application_314159265358979_0003/container_314159265358979_0003_01_42 > ); > -> at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1296) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:422) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.TestResourceLocalizationService.testDownloadingResourcesOnContainerKill(TestResourceLocalizationService.java:1322) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026873#comment-15026873 ] Hadoop QA commented on YARN-3224: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 30s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 185, now 188). {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 34s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 32s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 20m 14s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12761436/0002-YARN-3224.patch | | JIRA Issue | YARN-3224 | | Optional Tests | asflicense compile javac javadoc
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026890#comment-15026890 ] Kuhu Shukla commented on YARN-4386: --- Thank you [~djp] for your comments. TestClientRMTokens is failing regardless of the patch, tracked through YARN-4306. TestAMAuthorization fails the same way and I believe is tracked through YARN-4318. No tests were attached since the final outcome of this patch remains unchanged and a decommissioned node stays in that state regardless. [~sunilg], [~djp], request for review. Thanks a lot. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4318) Test failure: TestAMAuthorization
[ https://issues.apache.org/jira/browse/YARN-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026895#comment-15026895 ] Kuhu Shukla commented on YARN-4318: --- [~ozawa], are you looking at this test failure? I can work on it if this is unassigned. Thanks a lot. > Test failure: TestAMAuthorization > - > > Key: YARN-4318 > URL: https://issues.apache.org/jira/browse/YARN-4318 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Tsuyoshi Ozawa > > {quote} > Tests run: 4, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 14.891 sec > <<< FAILURE! - in > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization > testUnauthorizedAccess[0](org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization) > Time elapsed: 3.208 sec <<< ERROR! > java.net.UnknownHostException: Invalid host name: local host is: (unknown); > destination host is: "b5a5dd9ec835":8030; java.net.UnknownHostException; For > more details see: http://wiki.apache.org/hadoop/UnknownHost > at org.apache.hadoop.ipc.Client$Connection.(Client.java:403) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1512) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy15.registerApplicationMaster(Unknown Source) > at > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.registerApplicationMaster(ApplicationMasterProtocolPBClientImpl.java:106) > at > org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization.testUnauthorizedAccess(TestAMAuthorization.java:273) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026913#comment-15026913 ] Junping Du commented on YARN-4386: -- Thanks [~kshukla] for the patch. I agree these test failures are not related. However, can we add a test to verify no InvalidState get throw after the patch if recommission a decommissioned node when calling refreshNodesGracefully()? That test should get failed without applying your code here. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026481#comment-15026481 ] Varun Saxena commented on YARN-3862: [~sjlee0] Looking at the code again, I am actually using getColumnPrefixBytes in FlowRunEntityReader and passing it to TimelineFilterUtils. However in ApplicationEntityReader and GenericEntityReader I am assuming columnprefix is null so haven't used it, which I agree is wrong. However I have exposed a method as under. Here colPrefix is meant to take column prefix bytes as argument. {{public static FilterList createHBaseFilterList(byte[] colPrefix, TimelineFilterList filterList)}} I think this should be enough. However user can then pass any sequence of bytes as prefix. But it wont be as if current code will break due to change in the way column prefixes are encoded if the caller of this code is correct. Another alternative would be to pass ColumnPrefix object to TimelineFilterUtils and call getColumnPrefixBytes from there. Thoughts ? > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2882) Introducing container types
[ https://issues.apache.org/jira/browse/YARN-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026488#comment-15026488 ] Hadoop QA commented on YARN-2882: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 6s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 8s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s {color} | {color:green} yarn-2877 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 10s {color} | {color:green} yarn-2877 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in yarn-2877 has 3 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 58s {color} | {color:green} yarn-2877 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 22s {color} | {color:green} yarn-2877 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 11m 38s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 14, now 14). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 6s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 13m 45s {color} | {color:red} hadoop-yarn-project_hadoop-yarn-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 15, now 15). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} Patch generated 17 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 178, now 193). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 17 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 2s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 38s {color} | {color:red} hadoop-yarn-api in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 27s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 49m 21s {color} | {color:red} hadoop-yarn-client in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3226: -- Attachment: ClusterMetricsOnNodes_UI.png > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3226: -- Attachment: (was: Decommissioning_MetricsPge.png) > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4393) TestResourceLocalizationService#testFailedDirsResourceRelease fails intermittently
[ https://issues.apache.org/jira/browse/YARN-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026391#comment-15026391 ] Hadoop QA commented on YARN-4393: - | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 27s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 2s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 29s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 2s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 7s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12774282/YARN-4393.01.patch | | JIRA Issue | YARN-4393 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e1d35d909a40 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / b4c6b51 | | findbugs | v3.0.0 | | JDK v1.7.0_85 Test Results |
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3226: -- Attachment: 0002-YARN-3226.patch Thank you [~djp] for the comments. Updating a new patch and also attached new UI with "Cluster Metrics on Nodes". Kindly help to check the same. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3862) Support for fetching specific configs and metrics based on prefixes
[ https://issues.apache.org/jira/browse/YARN-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026505#comment-15026505 ] Varun Saxena commented on YARN-3862: I think passing columnprefix would be better as we can then encode spaces as well. > Support for fetching specific configs and metrics based on prefixes > --- > > Key: YARN-3862 > URL: https://issues.apache.org/jira/browse/YARN-3862 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-3862-YARN-2928.wip.01.patch, > YARN-3862-YARN-2928.wip.02.patch, YARN-3862-feature-YARN-2928.wip.03.patch > > > Currently, we will retrieve all the contents of the field if that field is > specified in the query API. In case of configs and metrics, this can become a > lot of data even though the user doesn't need it. So we need to provide a way > to query only a set of configs or metrics. > As a comma spearated list of configs/metrics to be returned will be quite > cumbersome to specify, we have to support either of the following options : > # Prefix match > # Regex > # Group the configs/metrics and query that group. > We also need a facility to specify a metric time window to return metrics in > a that window. This may be useful in plotting graphs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026402#comment-15026402 ] Hadoop QA commented on YARN-4304: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 31s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 5 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 224, now 225). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 4s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 73m 1s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 37s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 176m 28s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority | | | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.scheduler.fair.TestSchedulingPolicy | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem ||
[jira] [Commented] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026681#comment-15026681 ] Hadoop QA commented on YARN-3226: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 31s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 12s {color} | {color:red} Patch generated 24 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 104, now 127). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 12s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 43s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 136m 7s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | | hadoop.yarn.server.resourcemanager.webapp.TestNodesPage | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices | | |
[jira] [Updated] (YARN-4389) "yarn.am.blacklisting.enabled" and "yarn.am.blacklisting.disable-failure-threshold" should be app specific rather than a setting for whole YARN cluster
[ https://issues.apache.org/jira/browse/YARN-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4389: -- Attachment: 0001-YARN-4389.patch Hi [~djp] Attaching an initial version of patch. Kindly help to check the same. once this ticket is reviewed, I will raise a mapreduce ticket to pass this information from RM side. > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be app specific > rather than a setting for whole YARN cluster > --- > > Key: YARN-4389 > URL: https://issues.apache.org/jira/browse/YARN-4389 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Reporter: Junping Du >Assignee: Sunil G >Priority: Critical > Attachments: 0001-YARN-4389.patch > > > "yarn.am.blacklisting.enabled" and > "yarn.am.blacklisting.disable-failure-threshold" should be application > specific rather than a setting in cluster level, or we should't maintain > amBlacklistingEnabled and blacklistDisableThreshold in per rmApp level. We > should allow each am to override this config, i.e. via submissionContext. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
[ https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026736#comment-15026736 ] Junping Du commented on YARN-4384: -- Thanks [~leftnoteasy] for review and commit! > updateNodeResource CLI should not accept negative values for resource > - > > Key: YARN-4384 > URL: https://issues.apache.org/jira/browse/YARN-4384 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.8.0 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: YARN-4384.patch > > > updateNodeResource CLI should not accept negative values for MemSize and > vCores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4371) "yarn application -kill" should take multiple application ids
[ https://issues.apache.org/jira/browse/YARN-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-4371: -- Attachment: 0002-YARN-4371.patch Hi [~ozawa] Attaching an updated patch as per latest comment. As we call kill application one by one, we need to wait till each one is killed. I think this is fine. Kindly help to check the same. > "yarn application -kill" should take multiple application ids > - > > Key: YARN-4371 > URL: https://issues.apache.org/jira/browse/YARN-4371 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Tsuyoshi Ozawa >Assignee: Sunil G > Attachments: 0001-YARN-4371.patch, 0002-YARN-4371.patch > > > Currently we cannot pass multiple applications to "yarn application -kill" > command. The command should take multiple application ids at the same time. > Each entries should be separated with whitespace like: > {code} > yarn application -kill application_1234_0001 application_1234_0007 > application_1234_0012 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026805#comment-15026805 ] Junping Du commented on YARN-4386: -- Thanks [~kshukla] to report this issue and [~sunilg] for review! I think Recommission event shouldn't be applied on decommissioned nodes as it won't have any affect and we'd better to keep consistent with previous behavior before graceful decommission comes out. Thus, I would prefer to change "if (entry.getValue().getState() == NodeState.DECOMMISSIONING || entry.getValue().getState() == NodeState.DECOMMISSIONED)" to "if (entry.getValue().getState() == NodeState.DECOMMISSIONING)" to get rid of InvalidState exception in state machine. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil G updated YARN-3226: -- Attachment: 0003-YARN-3226.patch Attaching a new patch to address test failures and relevant checkstyles warnings. > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-914) (Umbrella) Support graceful decommission of nodemanager
[ https://issues.apache.org/jira/browse/YARN-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-914: Component/s: graceful > (Umbrella) Support graceful decommission of nodemanager > --- > > Key: YARN-914 > URL: https://issues.apache.org/jira/browse/YARN-914 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful >Affects Versions: 2.0.4-alpha >Reporter: Luke Lu >Assignee: Junping Du > Attachments: Gracefully Decommission of NodeManager (v1).pdf, > Gracefully Decommission of NodeManager (v2).pdf, > GracefullyDecommissionofNodeManagerv3.pdf > > > When NMs are decommissioned for non-fault reasons (capacity change etc.), > it's desirable to minimize the impact to running applications. > Currently if a NM is decommissioned, all running containers on the NM need to > be rescheduled on other NMs. Further more, for finished map tasks, if their > map output are not fetched by the reducers of the job, these map tasks will > need to be rerun as well. > We propose to introduce a mechanism to optionally gracefully decommission a > node manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1506: - Component/s: graceful > Replace set resource change on RMNode/SchedulerNode directly with event > notification. > - > > Key: YARN-1506 > URL: https://issues.apache.org/jira/browse/YARN-1506 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1506-v1.patch, YARN-1506-v10.patch, > YARN-1506-v11.patch, YARN-1506-v12.patch, YARN-1506-v13.patch, > YARN-1506-v14.patch, YARN-1506-v15.patch, YARN-1506-v16.patch, > YARN-1506-v17.patch, YARN-1506-v2.patch, YARN-1506-v3.patch, > YARN-1506-v4.patch, YARN-1506-v5.patch, YARN-1506-v6.patch, > YARN-1506-v7.patch, YARN-1506-v8.patch, YARN-1506-v9.patch > > > According to Vinod's comments on YARN-312 > (https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13846087), > we should replace RMNode.setResourceOption() with some resource change event. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-999) In case of long running tasks, reduce node resource should balloon out resource quickly by calling preemption API and suspending running task.
[ https://issues.apache.org/jira/browse/YARN-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-999: Component/s: graceful > In case of long running tasks, reduce node resource should balloon out > resource quickly by calling preemption API and suspending running task. > --- > > Key: YARN-999 > URL: https://issues.apache.org/jira/browse/YARN-999 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > > In current design and implementation, when we decrease resource on node to > less than resource consumption of current running tasks, tasks can still be > running until the end. But just no new task get assigned on this node > (because AvailableResource < 0) until some tasks are finished and > AvailableResource > 0 again. This is good for most cases but in case of long > running task, it could be too slow for resource setting to actually work so > preemption could be hired here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-998) Persistent resource change during NM/RM restart
[ https://issues.apache.org/jira/browse/YARN-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-998: Component/s: graceful > Persistent resource change during NM/RM restart > --- > > Key: YARN-998 > URL: https://issues.apache.org/jira/browse/YARN-998 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-998-sample.patch > > > When NM is restarted by plan or from a failure, previous dynamic resource > setting should be kept for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-996) REST API support for node resource configuration
[ https://issues.apache.org/jira/browse/YARN-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-996: Component/s: graceful > REST API support for node resource configuration > > > Key: YARN-996 > URL: https://issues.apache.org/jira/browse/YARN-996 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Tsuyoshi Ozawa > Attachments: YARN-996-2.patch, YARN-996-sample.patch > > > Besides admin protocol and CLI, REST API should also be supported for node > resource configuration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-313) Add Admin API for supporting node resource configuration in command line
[ https://issues.apache.org/jira/browse/YARN-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-313: Component/s: graceful > Add Admin API for supporting node resource configuration in command line > > > Key: YARN-313 > URL: https://issues.apache.org/jira/browse/YARN-313 > Project: Hadoop YARN > Issue Type: Sub-task > Components: client, graceful >Reporter: Junping Du >Assignee: Inigo Goiri >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-313-sample.patch, YARN-313-v1.patch, > YARN-313-v10.patch, YARN-313-v11.patch, YARN-313-v2.patch, YARN-313-v3.patch, > YARN-313-v4.patch, YARN-313-v5.patch, YARN-313-v6.patch, YARN-313-v7.patch, > YARN-313-v8.patch, YARN-313-v9.patch > > > We should provide some admin interface, e.g. "yarn rmadmin -refreshResources" > to support changes of node's resource specified in a config file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-312) Add updateNodeResource in ResourceManagerAdministrationProtocol
[ https://issues.apache.org/jira/browse/YARN-312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-312: Component/s: graceful > Add updateNodeResource in ResourceManagerAdministrationProtocol > --- > > Key: YARN-312 > URL: https://issues.apache.org/jira/browse/YARN-312 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, graceful >Affects Versions: 2.2.0 >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.3.0 > > Attachments: YARN-312-v1.patch, YARN-312-v10.patch, > YARN-312-v2.patch, YARN-312-v3.patch, YARN-312-v4.1.patch, YARN-312-v4.patch, > YARN-312-v5.1.patch, YARN-312-v5.patch, YARN-312-v6.patch, > YARN-312-v7.1.patch, YARN-312-v7.1.patch, YARN-312-v7.patch, > YARN-312-v8.patch, YARN-312-v9.patch > > > Add fundamental RPC (ResourceManagerAdministrationProtocol) to support node's > resource change. For design detail, please refer parent JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1000) Dynamic resource configuration feature can be configured to enable or disable and persistent on setting or not
[ https://issues.apache.org/jira/browse/YARN-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1000: - Component/s: graceful > Dynamic resource configuration feature can be configured to enable or disable > and persistent on setting or not > -- > > Key: YARN-1000 > URL: https://issues.apache.org/jira/browse/YARN-1000 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Attachments: YARN-1000-sample.patch > > > There are some configurations for feature of dynamic resource configuration: > 1. enable or not: if enable, then setting node resource in runtime through > CLI/REST/JMX can be successful, else exceptions of "function not supported" > will be thrown out. In future, we may support to enable this feature in > partial nodes which has resource flexibility (like virtual nodes). > 2. dynamic resource setting is persistent or not: it depends on users' > scenario to see if the life cycle of setting in runtime should be kept after > NM is down and restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-997) JMX support for node resource configuration
[ https://issues.apache.org/jira/browse/YARN-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-997: Component/s: graceful > JMX support for node resource configuration > --- > > Key: YARN-997 > URL: https://issues.apache.org/jira/browse/YARN-997 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du > > Beside YARN CLI and REST API, we can enable JMX interface to change node's > resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3224) Notify AM with containers (on decommissioning node) could be preempted after timeout.
[ https://issues.apache.org/jira/browse/YARN-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3224: - Component/s: graceful > Notify AM with containers (on decommissioning node) could be preempted after > timeout. > - > > Key: YARN-3224 > URL: https://issues.apache.org/jira/browse/YARN-3224 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3224.patch, 0002-YARN-3224.patch > > > We should leverage YARN preemption framework to notify AM that some > containers will be preempted after a timeout. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1394) RM to inform AMs when a container completed due to NM going offline -planned or unplanned
[ https://issues.apache.org/jira/browse/YARN-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1394: - Component/s: graceful > RM to inform AMs when a container completed due to NM going offline -planned > or unplanned > - > > Key: YARN-1394 > URL: https://issues.apache.org/jira/browse/YARN-1394 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Steve Loughran >Assignee: Rohith Sharma K S > > YARN-914 proposes graceful decommission of an NM, and NMs already have the > right to go offline. > If AMs could be told that a container completed from an NM option -offline vs > decommission, the AM could use that in its future blacklisting and placement > policy. > This matters in long-lived services which may like to place new instances > where they were placed before, and track hosts failure rates -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3212) RMNode State Transition Update with DECOMMISSIONING state
[ https://issues.apache.org/jira/browse/YARN-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3212: - Component/s: graceful > RMNode State Transition Update with DECOMMISSIONING state > - > > Key: YARN-3212 > URL: https://issues.apache.org/jira/browse/YARN-3212 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: RMNodeImpl - new.png, YARN-3212-v1.patch, > YARN-3212-v2.patch, YARN-3212-v3.patch, YARN-3212-v4.1.patch, > YARN-3212-v4.patch, YARN-3212-v5.1.patch, YARN-3212-v5.patch, > YARN-3212-v6.1.patch, YARN-3212-v6.2.patch, YARN-3212-v6.patch > > > As proposed in YARN-914, a new state of “DECOMMISSIONING” will be added and > can transition from “running” state triggered by a new event - > “decommissioning”. > This new state can be transit to state of “decommissioned” when > Resource_Update if no running apps on this NM or NM reconnect after restart. > Or it received DECOMMISSIONED event (after timeout from CLI). > In addition, it can back to “running” if user decides to cancel previous > decommission by calling recommission on the same node. The reaction to other > events is similar to RUNNING state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3225) New parameter or CLI for decommissioning node gracefully in RMAdmin CLI
[ https://issues.apache.org/jira/browse/YARN-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3225: - Component/s: graceful > New parameter or CLI for decommissioning node gracefully in RMAdmin CLI > --- > > Key: YARN-3225 > URL: https://issues.apache.org/jira/browse/YARN-3225 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Devaraj K > Fix For: 2.8.0 > > Attachments: YARN-3225-1.patch, YARN-3225-2.patch, YARN-3225-3.patch, > YARN-3225-4.patch, YARN-3225-5.patch, YARN-3225.patch, YARN-914.patch > > > New CLI (or existing CLI with parameters) should put each node on > decommission list to decommissioning status and track timeout to terminate > the nodes that haven't get finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3223) Resource update during NM graceful decommission
[ https://issues.apache.org/jira/browse/YARN-3223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3223: - Component/s: graceful > Resource update during NM graceful decommission > --- > > Key: YARN-3223 > URL: https://issues.apache.org/jira/browse/YARN-3223 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, resourcemanager >Affects Versions: 2.7.1 >Reporter: Junping Du >Assignee: Brook Zhou > Attachments: YARN-3223-v0.patch, YARN-3223-v1.patch, > YARN-3223-v2.patch > > > During NM graceful decommission, we should handle resource update properly, > include: make RMNode keep track of old resource for possible rollback, keep > available resource to 0 and used resource get updated when > container finished. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-666) [Umbrella] Support rolling upgrades in YARN
[ https://issues.apache.org/jira/browse/YARN-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-666: Component/s: rolling upgrade graceful > [Umbrella] Support rolling upgrades in YARN > --- > > Key: YARN-666 > URL: https://issues.apache.org/jira/browse/YARN-666 > Project: Hadoop YARN > Issue Type: Improvement > Components: graceful, rolling upgrade >Affects Versions: 2.0.4-alpha >Reporter: Siddharth Seth > Fix For: 2.6.0 > > Attachments: YARN_Rolling_Upgrades.pdf, YARN_Rolling_Upgrades_v2.pdf > > > Jira to track changes required in YARN to allow rolling upgrades, including > documentation and possible upgrade routes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1197) Support changing resources of an allocated container
[ https://issues.apache.org/jira/browse/YARN-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1197: - Component/s: graceful > Support changing resources of an allocated container > > > Key: YARN-1197 > URL: https://issues.apache.org/jira/browse/YARN-1197 > Project: Hadoop YARN > Issue Type: Task > Components: api, graceful, nodemanager, resourcemanager >Affects Versions: 2.1.0-beta >Reporter: Wangda Tan > Attachments: YARN-1197 old-design-docs-patches-for-reference.zip, > YARN-1197_Design.2015.06.24.pdf, YARN-1197_Design.2015.07.07.pdf, > YARN-1197_Design.2015.08.21.pdf, YARN-1197_Design.pdf > > > The current YARN resource management logic assumes resource allocated to a > container is fixed during the lifetime of it. When users want to change a > resource > of an allocated container the only way is releasing it and allocating a new > container with expected size. > Allowing run-time changing resources of an allocated container will give us > better control of resource usage in application side -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-556) [Umbrella] RM Restart phase 2 - Work preserving restart
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-556: Component/s: rolling upgrade graceful > [Umbrella] RM Restart phase 2 - Work preserving restart > --- > > Key: YARN-556 > URL: https://issues.apache.org/jira/browse/YARN-556 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful, resourcemanager, rolling upgrade >Reporter: Bikas Saha > Attachments: Work Preserving RM Restart.pdf, > WorkPreservingRestartPrototype.001.patch, YARN-1372.prelim.patch > > > YARN-128 covered storing the state needed for the RM to recover critical > information. This umbrella jira will track changes needed to recover the > running state of the cluster so that work can be preserved across RM restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4386: - Component/s: graceful > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entryentry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-291) [Umbrella] Dynamic resource configuration
[ https://issues.apache.org/jira/browse/YARN-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-291: Component/s: graceful > [Umbrella] Dynamic resource configuration > - > > Key: YARN-291 > URL: https://issues.apache.org/jira/browse/YARN-291 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Labels: features > Attachments: Elastic Resources for YARN-v0.2.pdf, > YARN-291-AddClientRMProtocolToSetNodeResource-03.patch, > YARN-291-CoreAndAdmin.patch, YARN-291-JMXInterfaceOnNM-02.patch, > YARN-291-OnlyUpdateWhenResourceChange-01-fix.patch, > YARN-291-YARNClientCommandline-04.patch, YARN-291-all-v1.patch, > YARN-291-core-HeartBeatAndScheduler-01.patch > > > The current Hadoop YARN resource management logic assumes per node resource > is static during the lifetime of the NM process. Allowing run-time > configuration on per node resource will give us finer granularity of resource > elasticity. This allows Hadoop workloads to coexist with other workloads on > the same hardware efficiently, whether or not the environment is virtualized. > More background and design details can be found in attached proposal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-311) Dynamic node resource configuration: core scheduler changes
[ https://issues.apache.org/jira/browse/YARN-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-311: Component/s: graceful > Dynamic node resource configuration: core scheduler changes > --- > > Key: YARN-311 > URL: https://issues.apache.org/jira/browse/YARN-311 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.3.0 > > Attachments: YARN-311-v1.patch, YARN-311-v10.patch, > YARN-311-v11.patch, YARN-311-v12.patch, YARN-311-v12b.patch, > YARN-311-v13.patch, YARN-311-v2.patch, YARN-311-v3.patch, YARN-311-v4.patch, > YARN-311-v4.patch, YARN-311-v5.patch, YARN-311-v6.1.patch, > YARN-311-v6.2.patch, YARN-311-v6.patch, YARN-311-v7.patch, YARN-311-v8.patch, > YARN-311-v9.patch > > > As the first step, we go for resource change on RM side and expose admin APIs > (admin protocol, CLI, REST and JMX API) later. In this jira, we will only > contain changes in scheduler. > The flow to update node's resource and awareness in resource scheduling is: > 1. Resource update is through admin API to RM and take effect on RMNodeImpl. > 2. When next NM heartbeat for updating status comes, the RMNode's resource > change will be aware and the delta resource is added to schedulerNode's > availableResource before actual scheduling happens. > 3. Scheduler do resource allocation according to new availableResource in > SchedulerNode. > For more design details, please refer proposal and discussions in parent > JIRA: YARN-291. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1949) Add admin ACL check to AdminService#updateNodeResource()
[ https://issues.apache.org/jira/browse/YARN-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1949: - Component/s: graceful > Add admin ACL check to AdminService#updateNodeResource() > > > Key: YARN-1949 > URL: https://issues.apache.org/jira/browse/YARN-1949 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager >Reporter: Kenji Kikushima >Assignee: Kenji Kikushima > Attachments: YARN-1949.patch > > > At present, updateNodeResource() doesn't check ACL. We should call > checkAcls() before setResourceOption(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4160) Dynamic NM Resources Configuration file should be simplified.
[ https://issues.apache.org/jira/browse/YARN-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4160: - Component/s: graceful > Dynamic NM Resources Configuration file should be simplified. > - > > Key: YARN-4160 > URL: https://issues.apache.org/jira/browse/YARN-4160 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > > In YARN-313, we provide CLI to refresh NMs' resources dynamically. The format > of dynamic-resources.xml is something like following: > {noformat} > > > yarn.resource.dynamic.node_id_1.vcores > 16 > > > yarn.resource.dynamic.node_id_1.memory > 1024 > > > {noformat} > This looks too redundant from review comments of YARN-313. We should have a > better, concisely format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4345) yarn rmadmin -updateNodeResource doesn't work
[ https://issues.apache.org/jira/browse/YARN-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4345: - Component/s: graceful > yarn rmadmin -updateNodeResource doesn't work > - > > Key: YARN-4345 > URL: https://issues.apache.org/jira/browse/YARN-4345 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager >Affects Versions: 2.8.0 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4345-v2.patch, YARN-4345-v3.patch, YARN-4345.patch > > > YARN-313 add CLI to update node resource. It works fine for batch mode > update. However, for single node update "yarn rmadmin -updateNodeResource" > failed to work because resource is not set properly in sending request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4384) updateNodeResource CLI should not accept negative values for resource
[ https://issues.apache.org/jira/browse/YARN-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-4384: - Component/s: graceful > updateNodeResource CLI should not accept negative values for resource > - > > Key: YARN-4384 > URL: https://issues.apache.org/jira/browse/YARN-4384 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, resourcemanager >Affects Versions: 2.8.0 >Reporter: Sushmitha Sreenivasan >Assignee: Junping Du > Fix For: 2.8.0 > > Attachments: YARN-4384.patch > > > updateNodeResource CLI should not accept negative values for MemSize and > vCores. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2489) ResouceOption's overcommitTimeout should be respected during resource update on NM
[ https://issues.apache.org/jira/browse/YARN-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-2489: - Component/s: graceful > ResouceOption's overcommitTimeout should be respected during resource update > on NM > -- > > Key: YARN-2489 > URL: https://issues.apache.org/jira/browse/YARN-2489 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du > > The ResourceOption to update NM's resource has two properties: Resource and > OvercommitTimeout. The later property is used to guarantee resource is > withdrawn after timeout is hit if resource is reduced to a value and current > resource consumption exceeds the new value. It currently use default value -1 > which means no timeout, and we should make this property work when updating > NM resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1508) Rename ResourceOption and document resource over-commitment cases
[ https://issues.apache.org/jira/browse/YARN-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1508: - Component/s: graceful > Rename ResourceOption and document resource over-commitment cases > - > > Key: YARN-1508 > URL: https://issues.apache.org/jira/browse/YARN-1508 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful, nodemanager, scheduler >Reporter: Junping Du >Assignee: Junping Du >Priority: Minor > > Per Vinod's comment in > YARN-312(https://issues.apache.org/jira/browse/YARN-312?focusedCommentId=13846087) > and Bikas' comment in > YARN-311(https://issues.apache.org/jira/browse/YARN-311?focusedCommentId=13848615), > the name of ResourceOption is not good enough for being understood. Also, we > need to document more on resource overcommitment time and use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1336) [Umbrella] Work-preserving nodemanager restart
[ https://issues.apache.org/jira/browse/YARN-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-1336: - Component/s: rolling upgrade graceful > [Umbrella] Work-preserving nodemanager restart > -- > > Key: YARN-1336 > URL: https://issues.apache.org/jira/browse/YARN-1336 > Project: Hadoop YARN > Issue Type: New Feature > Components: graceful, nodemanager, rolling upgrade >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: NMRestartDesignOverview.pdf, YARN-1336-rollup-v2.patch, > YARN-1336-rollup.patch > > > This serves as an umbrella ticket for tasks related to work-preserving > nodemanager restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3226) UI changes for decommissioning node
[ https://issues.apache.org/jira/browse/YARN-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated YARN-3226: - Component/s: graceful > UI changes for decommissioning node > --- > > Key: YARN-3226 > URL: https://issues.apache.org/jira/browse/YARN-3226 > Project: Hadoop YARN > Issue Type: Sub-task > Components: graceful >Reporter: Junping Du >Assignee: Sunil G > Attachments: 0001-YARN-3226.patch, 0002-YARN-3226.patch, > 0003-YARN-3226.patch, ClusterMetricsOnNodes_UI.png > > > Some initial thought is: > decommissioning nodes should still show up in the active nodes list since > they are still running containers. > A separate decommissioning tab to filter for those nodes would be nice, > although I suppose users can also just use the jquery table to sort/search for > nodes in that state from the active nodes list if it's too crowded to add yet > another node > state tab (or maybe get rid of some effectively dead tabs like the reboot > state tab). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4377) Confusing logs when killing container process
[ https://issues.apache.org/jira/browse/YARN-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaromir Vanek updated YARN-4377: Environment: Debian 7 (was: Debian linux) Priority: Minor (was: Critical) Description: Debug logs seem to be confusing when stating {{Sending signal to pid 20748 as user _submitter_}}. Nodemanager actually sends signals as a user {{yarn}} when using {{DefaultContainerExecutor}}. Complete nodemanager log: {quote} 2015-11-20 15:38:22,063 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Got pid 20748 for container container_1443786884805_2298_01_03 2015-11-20 15:38:22,063 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Sending signal to pid 20748 as user _submitter_ for container container_1443786884805_2298_01_03 2015-11-20 15:38:22,063 DEBUG org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 20748 as user _submitter_ 2015-11-20 15:38:22,069 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Sent signal SIGTERM to pid 20748 as user _submitter_ for container container_1443786884805_2298_01_03, result=failed 2015-11-20 15:38:22,319 DEBUG org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending signal 9 to pid 20748 as user _submitter_ {quote} {{SIGTERM}} and following {{SIGKILL}} signals seem to be sent with the *submitter* user permissions, but this is not true when container process is running under *yarn* user by default. What is the purpose of having submitter user in logs? was: It seems my processes in containers are not killed when the whole job is killed. Containers will hang in {{KILLING}} state until forever. The root of this problem is that signals sent to the container process are sent with wrong user permissions. >From the nodemanager log: {quote} 2015-11-20 15:38:22,063 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Got pid 20748 for container container_1443786884805_2298_01_03 2015-11-20 15:38:22,063 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Sending signal to pid 20748 as user _submitter_ for container container_1443786884805_2298_01_03 2015-11-20 15:38:22,063 DEBUG org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending signal 15 to pid 20748 as user _submitter_ 2015-11-20 15:38:22,069 DEBUG org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Sent signal SIGTERM to pid 20748 as user _submitter_ for container container_1443786884805_2298_01_03, result=failed 2015-11-20 15:38:22,319 DEBUG org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending signal 9 to pid 20748 as user _submitter_ {quote} {{SIGTERM}} and following {{SIGKILL}} signals are sent with the *submitter* user permissions, but the container process is running under *yarn* user by default (when using {{DefaultContainerExecutor}} which is true in my case). The result is that signals are ignored and container will run forever. Am I doing something wrong or is it a bug? Summary: Confusing logs when killing container process (was: Container process not killed) > Confusing logs when killing container process > - > > Key: YARN-4377 > URL: https://issues.apache.org/jira/browse/YARN-4377 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 > Environment: Debian 7 >Reporter: Jaromir Vanek >Priority: Minor > > Debug logs seem to be confusing when stating {{Sending signal to pid 20748 as > user _submitter_}}. > Nodemanager actually sends signals as a user {{yarn}} when using > {{DefaultContainerExecutor}}. > Complete nodemanager log: > {quote} > 2015-11-20 15:38:22,063 DEBUG > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Got pid 20748 for container container_1443786884805_2298_01_03 > 2015-11-20 15:38:22,063 DEBUG > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Sending signal to pid 20748 as user _submitter_ for container > container_1443786884805_2298_01_03 > 2015-11-20 15:38:22,063 DEBUG > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Sending > signal 15 to pid 20748 as user _submitter_ > 2015-11-20 15:38:22,069 DEBUG > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Sent signal SIGTERM to pid 20748 as user _submitter_ for container > container_1443786884805_2298_01_03, result=failed > 2015-11-20 15:38:22,319 DEBUG >