[jira] [Commented] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041238#comment-15041238 ] Hadoop QA commented on YARN-4340: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 10 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 57s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 20s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 53s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 56s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 8m 56s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 19m 31s {color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 1 new issues (was 751, now 751). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 25s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 9m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 28m 56s {color} | {color:red} root-jdk1.7.0_85 with JDK v1.7.0_85 generated 1 new issues (was 745, now 745). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 15s {color} | {color:red} Patch generated 32 new checkstyle issues in root (total was 352, now 382). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 37s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common introduced 4 new FindBugs issues. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 20s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 generated 9 new issues (was 100, now 100). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 23s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 8m 49s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.7.0_85 with JDK v1.7.0_85 generated 5 new issues (was 0, now 5). {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 8s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 25s {col
[jira] [Commented] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041235#comment-15041235 ] Hadoop QA commented on YARN-4411: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} YARN-4411 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12775739/YARN-4411.001.patch | | JIRA Issue | YARN-4411 | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9858/console | This message was automatically generated. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041201#comment-15041201 ] Hadoop QA commented on YARN-4002: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 10s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 35s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 7s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 136m 8s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_85 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12775729/YARN-4002-rwlock.p
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041176#comment-15041176 ] Konstantinos Karanasos commented on YARN-2885: -- [~leftnoteasy], I think the Distributed Scheduler AM Service makes sense. Given that we will already add the Distributed Scheduling Coordinator in the RM (which will be used for the top-k technique, and later for the corrective mechanisms in YARN-2888), what about using the same service for delegating the AMProtocol wrapper (rather than creating an additional one)? > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041169#comment-15041169 ] Konstantinos Karanasos commented on YARN-2877: -- Hi [~wangda], Thanks for pointing out HADOOP-11552. It seems it can also be used for the same purpose. I would suggest to follow the technique of frequent AM-LocalRM heartbeats and less frequent LocalRM-RM heartbeats to start with. Once HADOOP-11552 gets resolved, we can consider using it. bq. I think top-k node list technique cannot completely solve the over subscribe issue, in a production cluster, application comes in waves, it is possible that few large applications can exhaust all resources in a cluster within few seconds. Maybe another possible approach to mitigate the issue is: propagating queue-able containers from NM to RM periodically, so NM can still make decision but RM can also be aware of these queue-able containers. As long as k is sufficiently big, the phenomenon you describe should not be very pronounced. Moreover, corrective mechanisms (YARN-2888) will lead to moving tasks from highly-loaded nodes to less busy ones. Going further, what you are suggesting would also make sense. > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: distributed-scheduling-design-doc_v1.pdf > > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4411) ResourceManager IllegalArgumentException error
[ https://issues.apache.org/jira/browse/YARN-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yarntime updated YARN-4411: --- Attachment: YARN-4411.001.patch A simple patch which replace YarnApplicationAttemptState.valueOf(this.getState().toString()) with this.createApplicationAttemptState() no t ests added. > ResourceManager IllegalArgumentException error > -- > > Key: YARN-4411 > URL: https://issues.apache.org/jira/browse/YARN-4411 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: yarntime >Assignee: yarntime > Attachments: YARN-4411.001.patch > > > in version 2.7.1, line 1914 may cause IllegalArgumentException in > RMAppAttemptImpl: > YarnApplicationAttemptState.valueOf(this.getState().toString()) > cause by this.getState() returns type RMAppAttemptState which may not be > converted to YarnApplicationAttemptState. > {noformat} > java.lang.IllegalArgumentException: No enum constant > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.LAUNCHED_UNMANAGED_SAVING > at java.lang.Enum.valueOf(Enum.java:236) > at > org.apache.hadoop.yarn.api.records.YarnApplicationAttemptState.valueOf(YarnApplicationAttemptState.java:27) > at > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.createApplicationAttemptReport(RMAppAttemptImpl.java:1870) > at > org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationAttemptReport(ClientRMService.java:355) > at > org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationAttemptReport(ApplicationClientProtocolPBServiceImpl.java:355) > at > org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:425) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041158#comment-15041158 ] Hudson commented on YARN-4405: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #663 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/663/]) YARN-4405. Support node label store in non-appendable file system. (jianhe: rev 755dda8dd8bb23864abc752bad506f223fcac010) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041156#comment-15041156 ] Hudson commented on YARN-4292: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #663 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/663/]) YARN-4292. ResourceUtilization should be a part of NodeInfo REST API. (wangda: rev a2c3bfc8c1349102a7f2bc4ea96b80b429ac227b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceUtilizationInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, > 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039753#comment-15039753 ] Sangjin Lee commented on YARN-4356: --- I've run unit tests for the affected projects, and confirmed that there are no additional failures. > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > -- > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4356) ensure the timeline service v.2 is disabled cleanly and has no impact when it's turned off
[ https://issues.apache.org/jira/browse/YARN-4356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-4356: -- Attachment: YARN-4356-feature-YARN-2928.poc.001.patch I'm posting a POC patch to get early feedback. I haven't added a config that checks the version of the timeline service yet, and I need to sort out various configuration parameters a little more. But assuming those things will be in place later on, please take a look at whether the timeline service v.2 code/memory/behavior is cleanly turned off if the timeline service v.2 is disabled. I would greatly appreciate your feedback. Thanks! > ensure the timeline service v.2 is disabled cleanly and has no impact when > it's turned off > -- > > Key: YARN-4356 > URL: https://issues.apache.org/jira/browse/YARN-4356 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Sangjin Lee >Assignee: Sangjin Lee >Priority: Critical > Labels: yarn-2928-1st-milestone > Attachments: YARN-4356-feature-YARN-2928.poc.001.patch > > > For us to be able to merge the first milestone drop to trunk, we want to > ensure that once disabled the timeline service v.2 has no impact from the > server side to the client side. If the timeline service is not enabled, no > action should be done. If v.1 is enabled but not v.2, v.1 should behave the > same as it does before the merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-4002: -- Attachment: YARN-4002-rwlock.patch YARN-4002-lockless-read.patch 2 patch for the 2 proposed solutions submitted. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: YARN-4002-lockless-read.patch, YARN-4002-rwlock.patch, > YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4417) Make RM and Timeline-server REST APIs more consistent
[ https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039714#comment-15039714 ] Hadoop QA commented on YARN-4417: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 2s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 15s {color} | {color:red} Patch generated 4 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager (total was 48, now 51). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 46s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 53s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 148m 7s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps | | | hadoop.yarn.server.resourcemanager.TestAMAuthorization | | JDK v1.7.0_91 Failed junit tests | hadoop.yarn.server.resourcemanager.TestClientRMTokens | | | hadoop.yarn.server.resourceman
[jira] [Commented] (YARN-4419) Trunk building failed
[ https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039708#comment-15039708 ] Hudson commented on YARN-4419: -- FAILURE: Integrated in Hadoop-trunk-Commit #8918 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8918/]) Add missing file for YARN-4419 (jianhe: rev e84d6ca2df775bb4c93f6c08b345ac30b3a4525b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NonAppendableFSNodeLabelStore.java > Trunk building failed > - > > Key: YARN-4419 > URL: https://issues.apache.org/jira/browse/YARN-4419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Jian He > > Checking out the latest codes, mvn clean package -DskipTests failed as below. > {noformat} > [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ > hadoop-yarn-common --- > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 72 source files to > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15] > cannot find symbol > symbol: class NonAppendableFSNodeLabelStore > location: class > org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore > [INFO] 1 error > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039679#comment-15039679 ] Sunil G commented on YARN-4405: --- Thanks [~jianhe] and [~leftnoteasy]. Its fine now. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039677#comment-15039677 ] Wangda Tan commented on YARN-4405: -- Thanks [~jianhe], I was committing just now and found it is already fixed. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-4419) Trunk building failed
[ https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved YARN-4419. --- Resolution: Fixed > Trunk building failed > - > > Key: YARN-4419 > URL: https://issues.apache.org/jira/browse/YARN-4419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Jian He > > Checking out the latest codes, mvn clean package -DskipTests failed as below. > {noformat} > [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ > hadoop-yarn-common --- > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 72 source files to > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15] > cannot find symbol > symbol: class NonAppendableFSNodeLabelStore > location: class > org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore > [INFO] 1 error > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4419) Trunk building failed
[ https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039673#comment-15039673 ] Jian He commented on YARN-4419: --- sorry, missed a file while committing, fixed now. > Trunk building failed > - > > Key: YARN-4419 > URL: https://issues.apache.org/jira/browse/YARN-4419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Jian He > > Checking out the latest codes, mvn clean package -DskipTests failed as below. > {noformat} > [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ > hadoop-yarn-common --- > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 72 source files to > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15] > cannot find symbol > symbol: class NonAppendableFSNodeLabelStore > location: class > org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore > [INFO] 1 error > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039671#comment-15039671 ] Jian He commented on YARN-4405: --- sorry, missed a file while committing, fixed now. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039670#comment-15039670 ] Wangda Tan commented on YARN-4405: -- Thanks for reporting, [~drankye], [~sunilg]. Fixing this issue now. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: YARN-4340.v5.patch > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-4419) Trunk building failed
[ https://issues.apache.org/jira/browse/YARN-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-4419: - Assignee: Jian He > Trunk building failed > - > > Key: YARN-4419 > URL: https://issues.apache.org/jira/browse/YARN-4419 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Kai Zheng >Assignee: Jian He > > Checking out the latest codes, mvn clean package -DskipTests failed as below. > {noformat} > [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ > hadoop-yarn-common --- > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 72 source files to > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15] > cannot find symbol > symbol: class NonAppendableFSNodeLabelStore > location: class > org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore > [INFO] 1 error > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: (was: YARN-4340.v5.patch) > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039663#comment-15039663 ] Sean Po commented on YARN-4340: --- I took all your suggestions above, and with regards to your second comment, I also removed 'user' from the interface in v5 of the patch. The REST aspect of the code was also removed, and will be a part of a second ticket YARN-4420. > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Description: This JIRA tracks changes to the APIs of the reservation system, and enables querying the reservation system on which reservation exists by "time-range, reservation-id". YARN-4420 has a dependency on this. was: This JIRA tracks changes to the APIs of the reservation system, and enables querying the reservation system on which reservation exists by "time-range, reservation-id, username". > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id". > YARN-4420 has a dependency on this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4420) Add REST API for List Reservations
Sean Po created YARN-4420: - Summary: Add REST API for List Reservations Key: YARN-4420 URL: https://issues.apache.org/jira/browse/YARN-4420 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, fairscheduler, resourcemanager Reporter: Sean Po Assignee: Sean Po Priority: Minor This JIRA tracks changes to the REST APIs of the reservation system and enables querying the reservation on which reservations exists by "time-range, and reservation-id". This task has a dependency on YARN-4340. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4420) Add REST API for List Reservations
[ https://issues.apache.org/jira/browse/YARN-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4420: -- Issue Type: Sub-task (was: Bug) Parent: YARN-2572 > Add REST API for List Reservations > -- > > Key: YARN-4420 > URL: https://issues.apache.org/jira/browse/YARN-4420 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, fairscheduler, resourcemanager >Reporter: Sean Po >Assignee: Sean Po >Priority: Minor > > This JIRA tracks changes to the REST APIs of the reservation system and > enables querying the reservation on which reservations exists by "time-range, > and reservation-id". > This task has a dependency on YARN-4340. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039648#comment-15039648 ] Sunil G commented on YARN-4405: --- Yes, its a compile pblm {noformat} sb_work2/hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15] cannot find symbol [ERROR] symbol: class NonAppendableFSNodeLabelStore [ERROR] location: class org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore [ERROR] -> [Help 1] {noformat} Usually such issues will be handled with an addenum patch in the same ticket. So it could be handled here, [~leftnoteasy], could you please help for the same. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039628#comment-15039628 ] Kai Zheng commented on YARN-4405: - YARN-4419 was just opened. Please help check and comment it. Thanks. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4419) Trunk building failed
Kai Zheng created YARN-4419: --- Summary: Trunk building failed Key: YARN-4419 URL: https://issues.apache.org/jira/browse/YARN-4419 Project: Hadoop YARN Issue Type: Bug Reporter: Kai Zheng Checking out the latest codes, mvn clean package -DskipTests failed as below. {noformat} [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hadoop-yarn-common --- [INFO] Changes detected - recompiling the module! [INFO] Compiling 72 source files to /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/target/test-classes [INFO] - [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /home/workspace/hadoop3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java:[75,15] cannot find symbol symbol: class NonAppendableFSNodeLabelStore location: class org.apache.hadoop.yarn.nodelabels.TestFileSystemNodeLabelsStore [INFO] 1 error {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039614#comment-15039614 ] Hudson commented on YARN-3840: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #662 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/662/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 9f77ccad735f4843ce2c38355de9f434838d4507) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-sorting/natural.js * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Varun Saxena > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, > YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, > YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039613#comment-15039613 ] Jun Gong commented on YARN-3480: [~jianhe] thanks for the remind. I thought the final solution is "we only have (limits + asynchronous recovery) for services, once YARN-1039 goes in", so I am waiting for YARN-1039. However what you just suggested is reasonable too, it depends on how important we think apps history information is. We have already implemented it and it works well in our cluster, I could port it to trunk. I will attach a patch against trunk code later. > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3201) add args for DistributedShell to specify a image for tasks that will run on docker
[ https://issues.apache.org/jira/browse/YARN-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yarntime reassigned YARN-3201: -- Assignee: yarntime > add args for DistributedShell to specify a image for tasks that will run on > docker > -- > > Key: YARN-3201 > URL: https://issues.apache.org/jira/browse/YARN-3201 > Project: Hadoop YARN > Issue Type: Wish > Components: applications/distributed-shell >Reporter: zhangwei >Assignee: yarntime > > It's very useful to execute a script on docker to do some test, but the > distributedshell has no args to set the image. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit
[ https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039609#comment-15039609 ] Hadoop QA commented on YARN-3669: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 43s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 29s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 41s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 326, now 327). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 38s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 22s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 51s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 36s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 6
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039597#comment-15039597 ] Hong Zhiguo commented on YARN-4002: --- I'm working on it. I've proposed 2 different solutions and waiting for specific comments. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039594#comment-15039594 ] Hudson commented on YARN-4405: -- FAILURE: Integrated in Hadoop-trunk-Commit #8917 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8917/]) YARN-4405. Support node label store in non-appendable file system. (jianhe: rev 755dda8dd8bb23864abc752bad506f223fcac010) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/TestFileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/nodelabels/DummyCommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/test/java/org/apache/hadoop/yarn/conf/TestYarnConfigurationFields.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/nodelabels/NullRMNodeLabelsManager.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/TestConfigurationFieldsBase.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/FileSystemNodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/NodeLabelsStore.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/nodelabels/CommonNodeLabelsManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4418) AM Resource Limit per partition can be updated to ResourceUsage as well
Sunil G created YARN-4418: - Summary: AM Resource Limit per partition can be updated to ResourceUsage as well Key: YARN-4418 URL: https://issues.apache.org/jira/browse/YARN-4418 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.7.1 Reporter: Sunil G Assignee: Sunil G AMResourceLimit is now extended to all partitions after YARN-3216. Its also better to track this ResourceLimit in existing {{ResourceUsage}} so that REST framework can be benefited to avail this information easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039563#comment-15039563 ] Wangda Tan commented on YARN-4309: -- Hi [~vvasudev], Thanks for working on this task, it's really useful to identify container launch issues, some questions/comments: - Since debug information fetch script (like copy script and list files) is at the end of launch_container.sh, is it possible that a container is killed so such script cannot be executed? - Do you think is it better to generate a separated script file to fetch debug information before launch user code? Which we can 1. Guarantee it will be executed 2. It won't add debug information to normal launch_container.sh. 3. Return code of script won't affected by debug script. - Is it possible to enable/disable this function while NM is running? +[~sidharta-s]. > Add debug information to application logs when a container fails > > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3480) Recovery may get very slow with lots of services with lots of app-attempts
[ https://issues.apache.org/jira/browse/YARN-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039560#comment-15039560 ] Jian He commented on YARN-3480: --- [~hex108], i know it's been a long time, would you still like to work on this ? IMO, as a first step, we can do as in previous [comment|https://issues.apache.org/jira/browse/YARN-3480?focusedCommentId=14533731&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14533731] to remove the apps beyond the validity interval as mostly those apps user care least. cc [~xgong]. > Recovery may get very slow with lots of services with lots of app-attempts > -- > > Key: YARN-3480 > URL: https://issues.apache.org/jira/browse/YARN-3480 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jun Gong >Assignee: Jun Gong > Attachments: YARN-3480.01.patch, YARN-3480.02.patch, > YARN-3480.03.patch, YARN-3480.04.patch > > > When RM HA is enabled and running containers are kept across attempts, apps > are more likely to finish successfully with more retries(attempts), so it > will be better to set 'yarn.resourcemanager.am.max-attempts' larger. However > it will make RMStateStore(FileSystem/HDFS/ZK) store more attempts, and make > RM recover process much slower. It might be better to set max attempts to be > stored in RMStateStore. > BTW: When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039551#comment-15039551 ] Wangda Tan commented on YARN-4304: -- [~sunilg], I'm fine with changing ResourceUsage in a separated JIRA (logic change only, not renaming), but I think it's better to finish ResourceUsage change before this patch. Which we can have a more clear view of this patch once ResourceUsage changes are completed. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039550#comment-15039550 ] Sunil G commented on YARN-4405: --- Thanks [~leftnoteasy]. Latest patch looks good. +1. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039547#comment-15039547 ] Sunil G commented on YARN-4304: --- Thanks [~leftnoteasy] for the comments. Yes. It's fine and we can change those variables as per the comments. For {{ResourceUsage}}, since it's an existing code, we need to add these new items and make the renaming as suggested. Both these can be tracked in another ticket together. And later can be made used here. I ll create another ticket for that if you feel it's fine. Thank you. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039539#comment-15039539 ] Wangda Tan commented on YARN-4405: -- [~sunilg], please let me know if you have any other comments on latest patch. Thanks, > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039537#comment-15039537 ] Wangda Tan commented on YARN-3368: -- This task depends on some fixes of REST APIs, which is tracked by YARN-4417. > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) > yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3368: - Attachment: (POC, Aug-2015)) yarn-ui-screenshots.zip Archived and reattached old screenshots > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip, (POC, Aug-2015)) > yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3368: - Attachment: (was: Queue-Hierarchy-Screenshot.png) > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3368: - Attachment: (was: Applications-table-Screenshot.png) > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3368: - Attachment: (Dec 3 2015) yarn-ui-screenshots.zip > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039534#comment-15039534 ] Sangjin Lee commented on YARN-4238: --- [~varun_saxena], it seems like it should be relatively easy to correct the 2 checkstyle violations. Could you also take a look at the unit test failures, especially TestSystemMetricsPublisher to see if it is related? Thanks! > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.02.patch, YARN-4238-feature-YARN-2928.03.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039532#comment-15039532 ] Wangda Tan commented on YARN-3368: -- Hi guys, Some updates of the new YARN web UI works: - Rewrote the UI framework (Still using Ember.JS) so new pages/charts can be added easier. - Finished scheduler page, which contains a select-able queue hierarchy view and also related scheduler information. - Finished apps table, app page, app-attempt page and also timeline view of app-attempts/containers. - Finished cluster overview page, which contains several charts to show overview of the cluster (such as total memory, total apps, etc.) Attached screenshots to demostrate latest changes. There're still many pending tasks - Finalize design. - Page of node managers' view. - Bugs / hardcoded configurations. In order to advance the work faster, I propose to create a new branch (YARN-3368), so more people can participate development/discussion. All UI-related code changes will be in a separated folder "hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui". Thanks for [~Sreenath], [~vinodkv], [~jianhe], [~gtCarrera] for supports and suggestions. Thoughts? > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3368) Improve YARN web UI
[ https://issues.apache.org/jira/browse/YARN-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-3368: - Attachment: (was: YARN-3368.poc.1.patch) > Improve YARN web UI > --- > > Key: YARN-3368 > URL: https://issues.apache.org/jira/browse/YARN-3368 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Jian He > Attachments: (Dec 3 2015) yarn-ui-screenshots.zip > > > The goal is to improve YARN UI for better usability. > We may take advantage of some existing front-end frameworks to build a > fancier, easier-to-use UI. > The old UI continue to exist until we feel it's ready to flip to the new UI. > This serves as an umbrella jira to track the tasks. we can do this in a > branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039529#comment-15039529 ] Hadoop QA commented on YARN-4311: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 58s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 29s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 5s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 16s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 50s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 8m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 2s {color} | {color:red} Patch generated 1 new checkstyle issues in root (total was 396, now 396). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 19s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 24s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 59s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 41s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s {color} | {color:green} hadoop-sls in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 14s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color
[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039527#comment-15039527 ] Sunil G commented on YARN-4292: --- Thank you very much [~leftnoteasy] for the review and commit. > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, > 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039522#comment-15039522 ] Hadoop QA commented on YARN-4238: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 12s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 8m 54s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 31s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 44s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 48s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 26s {color} | {color:red} hadoop-yarn-server-resourcemanager in feature-YARN-2928 failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 37s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 14m 28s {color} | {color:red} root-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 779, now 779). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 54s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 24m 22s {color} | {color:red} root-jdk1.7.0_85 with JDK v1.7.0_85 generated 2 new issues (was 772, now 772). {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 54s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 6s {color} | {color:red} Patch generated 2 new checkstyle issues in root (total was 401, now 339). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 31s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 25s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 13s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 1s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 49s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 10m 50s {color} | {color:red} hadoop-mapreduce-client-app in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 0s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch pa
[jira] [Commented] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039519#comment-15039519 ] Sidharta Seethana commented on YARN-3542: - hi [~vvasudev], Thank you for the patch. I took a look at the patch and it is a bit unclear how the new configs/resource handler are meant to interact with the existing {{CgroupsLCEResourcesHandler}} . IMO, one of the goals here is to deprecate {{CgroupsLCEResourcesHandler}} and use the new resource handler mechanism so that all the resource handling/isolation is handled in a consistent manner. Could you please provide a description of the changes introduced in this patch and what the interaction would be with the existing CPU cgroups implementation (especially from a configuration perspective) ? thanks, -Sidharta > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3542) Re-factor support for CPU as a resource using the new ResourceHandler mechanism
[ https://issues.apache.org/jira/browse/YARN-3542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sidharta Seethana reassigned YARN-3542: --- Assignee: Sidharta Seethana (was: Varun Vasudev) > Re-factor support for CPU as a resource using the new ResourceHandler > mechanism > --- > > Key: YARN-3542 > URL: https://issues.apache.org/jira/browse/YARN-3542 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Sidharta Seethana >Assignee: Sidharta Seethana >Priority: Critical > Attachments: YARN-3542.001.patch, YARN-3542.002.patch > > > In YARN-3443 , a new ResourceHandler mechanism was added which enabled easier > addition of new resource types in the nodemanager (this was used for network > as a resource - See YARN-2140 ). We should refactor the existing CPU > implementation ( LinuxContainerExecutor/CgroupsLCEResourcesHandler ) using > the new ResourceHandler mechanism. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4403) (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating period
[ https://issues.apache.org/jira/browse/YARN-4403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039502#comment-15039502 ] Jian He commented on YARN-4403: --- patch looks good to me. one minor suggestion, do you think we can change the base AbstractLivelinessMonitor to have a default constructor with MonotonicClock and callers can use this constructor instead ? > (AM/NM/Container)LivelinessMonitor should use monotonic time when calculating > period > > > Key: YARN-4403 > URL: https://issues.apache.org/jira/browse/YARN-4403 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4403.patch > > > Currently, (AM/NM/Container)LivelinessMonitor use current system time to > calculate a duration of expire which could be broken by settimeofday. We > should use Time.monotonicNow() instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit
[ https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3669: -- Assignee: Xuan Gong (was: Vinod Kumar Vavilapalli) > Attempt-failures validatiy interval should have a global admin configurable > lower limit > --- > > Key: YARN-3669 > URL: https://issues.apache.org/jira/browse/YARN-3669 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Xuan Gong > Labels: newbie > Attachments: YARN-3669.1.patch > > > Found this while reviewing YARN-3480. > bq. When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. > I think we need to have a lower limit on the failure-validaty interval to > avoid situations like this. > Having this will avoid pardoning too-many failures in too-short a duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4417) Make RM and Timeline-server REST APIs more consistent
Wangda Tan created YARN-4417: Summary: Make RM and Timeline-server REST APIs more consistent Key: YARN-4417 URL: https://issues.apache.org/jira/browse/YARN-4417 Project: Hadoop YARN Issue Type: Bug Reporter: Wangda Tan Assignee: Wangda Tan There're some differences between RM and timeline-server's REST APIs, for example, RM REST API doesn't support get application attempt info by app-id and attempt-id but timeline server supports. We could make them more consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3669) Attempt-failures validatiy interval should have a global admin configurable lower limit
[ https://issues.apache.org/jira/browse/YARN-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-3669: Attachment: YARN-3669.1.patch > Attempt-failures validatiy interval should have a global admin configurable > lower limit > --- > > Key: YARN-3669 > URL: https://issues.apache.org/jira/browse/YARN-3669 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli > Labels: newbie > Attachments: YARN-3669.1.patch > > > Found this while reviewing YARN-3480. > bq. When 'attemptFailuresValidityInterval'(introduced in YARN-611) is set to > a small value, retried attempts might be very large. So we need to delete > some attempts stored in RMStateStore and RMStateStore. > I think we need to have a lower limit on the failure-validaty interval to > avoid situations like this. > Having this will avoid pardoning too-many failures in too-short a duration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4417) Make RM and Timeline-server REST APIs more consistent
[ https://issues.apache.org/jira/browse/YARN-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4417: - Attachment: YARN-4417.1.patch Attached initial patch for review. > Make RM and Timeline-server REST APIs more consistent > - > > Key: YARN-4417 > URL: https://issues.apache.org/jira/browse/YARN-4417 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4417.1.patch > > > There're some differences between RM and timeline-server's REST APIs, for > example, RM REST API doesn't support get application attempt info by app-id > and attempt-id but timeline server supports. We could make them more > consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: YARN-4340.v5.patch > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch, YARN-4340.v5.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: (was: YARN-4340.v8.patch) > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4340) Add "list" API to reservation system
[ https://issues.apache.org/jira/browse/YARN-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Po updated YARN-4340: -- Attachment: YARN-4340.v8.patch > Add "list" API to reservation system > > > Key: YARN-4340 > URL: https://issues.apache.org/jira/browse/YARN-4340 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, fairscheduler, resourcemanager >Reporter: Carlo Curino >Assignee: Sean Po > Attachments: YARN-4340.v1.patch, YARN-4340.v2.patch, > YARN-4340.v3.patch, YARN-4340.v4.patch > > > This JIRA tracks changes to the APIs of the reservation system, and enables > querying the reservation system on which reservation exists by "time-range, > reservation-id, username". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run
[ https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038803#comment-15038803 ] Hadoop QA commented on YARN-1974: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 10m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 18s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell (total was 148, now 148). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 21s {color} | {color:red} hadoop-yarn-applications-distributedshell in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 8s {color} | {color:green} hadoop-yarn-applications-distributedshell in the patch passed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 24s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 34m 54s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.yarn.applications.distributedshell.TestDistributedShell | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12775669/YARN-1974.1.patch | | JIRA Issue | YARN-1974 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 613c1f2f9c94 3.13.0-36-lowlatency
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038769#comment-15038769 ] Wangda Tan commented on YARN-2885: -- [~kkaranasos], bq. ... That said, I am not sure if it is required to create a wrapper at this point for the AM protocol. As suggested by [~asuresh], bq. Have an Distributed Scheduler AM Service running on the RM if DS is enabled. This will implement the new protocol (it will delegate all the AMProtocol stuff to the AMService and will handle DistScheduler specific stuff) Do you think if it's a good idea? > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038763#comment-15038763 ] Hadoop QA commented on YARN-4405: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 5 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 11m 24s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 22s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 48s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 32s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 12s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 26s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 26s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s {color} | {color:red} Patch generated 7 new checkstyle issues in root (total was 264, now 267). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 13 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 0s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 49s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 5m 22s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 45s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 26s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 2s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 1s {color} | {color:red} hadoop-yarn-server-resourcemanager in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 7m 49s {color} | {color:red} hadoop-common in the patch failed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 27s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.7.0_85. {color} | | {color:green}+1{color} | {color:gr
[jira] [Commented] (YARN-2885) Create AMRMProxy request interceptor for distributed scheduling decisions for queueable containers
[ https://issues.apache.org/jira/browse/YARN-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038768#comment-15038768 ] Wangda Tan commented on YARN-2885: -- Hi [~asuresh], Thanks for reply, bq. What we were aiming for is to not send any Queueable resource reqs to the RM... After thinking, RM could directly support queue-able container allocation. Since queue-able/guaranteed executionType is part of user-facing API, so scheduler can consider to allocate queue-able container or not. LocalRM is a way to allocate queue-able containers. But please make sure that there's no assumption (hardcoded logic) that queue-able container can be only allocated by LocalRM? bq. I totally agree that the AM should not be bothered with this.. But if you notice, It is actually not set by the AM, it set by the DistSchedulerReqeustInterceptor when it proxies the AM calls... Since you planned to have a LocalRM coordinator, I would prefer to add a separated Distributed Scheduler Coordinator service and protocols. Other comments are make sense to me. > Create AMRMProxy request interceptor for distributed scheduling decisions for > queueable containers > -- > > Key: YARN-2885 > URL: https://issues.apache.org/jira/browse/YARN-2885 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager, resourcemanager >Reporter: Konstantinos Karanasos >Assignee: Arun Suresh > Attachments: YARN-2885-yarn-2877.001.patch > > > We propose to add a Local ResourceManager (LocalRM) to the NM in order to > support distributed scheduling decisions. > Architecturally we leverage the RMProxy, introduced in YARN-2884. > The LocalRM makes distributed decisions for queuable containers requests. > Guaranteed-start requests are still handled by the central RM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2877) Extend YARN to support distributed scheduling
[ https://issues.apache.org/jira/browse/YARN-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038760#comment-15038760 ] Wangda Tan commented on YARN-2877: -- Hi [~kkaranasos], Thanks for reply: bq. We are planning to address this by having smaller heartbeat intervals in the AM-LocalRM communication when compared to the LocalRM-RM. For instance, the AM-LocalRM heartbeat interval can be set to 50ms, while the LocalRM-RM interval to 200ms (in other words, we will only propagate to the RM only one in every four heartbeats). Maybe you could also take a look at HADOOP-11552, which could possibly achieve better latency and reduce heartbeat frequency. bq. This is a valid concern. The best way to minimize preemption is through the "top-k node list" technique described above. As the LocalRM will be placing the QUEUEABLE containers to the least loaded nodes, preemption will be minimized. I think top-k node list technique cannot completely solve the over subscribe issue, in a production cluster, application comes in waves, it is possible that few large applications can exhaust all resources in a cluster within few seconds. Maybe another possible approach to mitigate the issue is: propagating queue-able containers from NM to RM periodically, so NM can still make decision but RM can also be aware of these queue-able containers. bq. That said, as you also mention, QUEUEABLE containers are more suitable for short-running tasks, where the probability of a container being preempted is smaller. Ideally it's better to support all non-long-running-service tasks. LocalRM could allocate short-running queue-able tasks and RM an allocate other queue-able tasks. > Extend YARN to support distributed scheduling > - > > Key: YARN-2877 > URL: https://issues.apache.org/jira/browse/YARN-2877 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager, resourcemanager >Reporter: Sriram Rao >Assignee: Konstantinos Karanasos > Attachments: distributed-scheduling-design-doc_v1.pdf > > > This is an umbrella JIRA that proposes to extend YARN to support distributed > scheduling. Briefly, some of the motivations for distributed scheduling are > the following: > 1. Improve cluster utilization by opportunistically executing tasks otherwise > idle resources on individual machines. > 2. Reduce allocation latency. Tasks where the scheduling time dominates > (i.e., task execution time is much less compared to the time required for > obtaining a container from the RM). > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4292) ResourceUtilization should be a part of NodeInfo REST API
[ https://issues.apache.org/jira/browse/YARN-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038743#comment-15038743 ] Hudson commented on YARN-4292: -- FAILURE: Integrated in Hadoop-trunk-Commit #8915 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8915/]) YARN-4292. ResourceUtilization should be a part of NodeInfo REST API. (wangda: rev a2c3bfc8c1349102a7f2bc4ea96b80b429ac227b) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ResourceUtilizationInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/CHANGES.txt > ResourceUtilization should be a part of NodeInfo REST API > - > > Key: YARN-4292 > URL: https://issues.apache.org/jira/browse/YARN-4292 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Wangda Tan >Assignee: Sunil G > Fix For: 2.8.0 > > Attachments: 0001-YARN-4292.patch, 0002-YARN-4292.patch, > 0003-YARN-4292.patch, 0004-YARN-4292.patch, 0005-YARN-4292.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4311) Removing nodes from include and exclude lists will not remove them from decommissioned nodes list
[ https://issues.apache.org/jira/browse/YARN-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated YARN-4311: -- Attachment: YARN-4311-v3.patch Fixed one test failure in TestYarnConfigurationFields by adding the new configs to yarn-default. TestAMAuthorization and TestClientRMTokens are unrelated and fail as per YARN-4318 and YARN-4306. Corrected check-style issues. > Removing nodes from include and exclude lists will not remove them from > decommissioned nodes list > - > > Key: YARN-4311 > URL: https://issues.apache.org/jira/browse/YARN-4311 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: YARN-4311-v1.patch, YARN-4311-v2.patch, > YARN-4311-v3.patch > > > In order to fully forget about a node, removing the node from include and > exclude list is not sufficient. The RM lists it under Decomm-ed nodes. The > tricky part that [~jlowe] pointed out was the case when include lists are not > used, in that case we don't want the nodes to fall off if they are not active. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run
[ https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned YARN-1974: --- Assignee: Xuan Gong (was: Hong Zhiguo) > add args for DistributedShell to specify a set of nodes on which the tasks run > -- > > Key: YARN-1974 > URL: https://issues.apache.org/jira/browse/YARN-1974 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Affects Versions: 2.7.0 >Reporter: Hong Zhiguo >Assignee: Xuan Gong >Priority: Minor > Attachments: YARN-1974.1.patch, YARN-1974.patch > > > It's very useful to execute a script on a specific set of machines for both > testing and maintenance purpose. > The args "--nodes" and "--relax_locality" are added to DistributedShell. > Together with an unit test using miniCluster. > It's also tested on our real cluster with Fair scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-1974) add args for DistributedShell to specify a set of nodes on which the tasks run
[ https://issues.apache.org/jira/browse/YARN-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-1974: Attachment: YARN-1974.1.patch > add args for DistributedShell to specify a set of nodes on which the tasks run > -- > > Key: YARN-1974 > URL: https://issues.apache.org/jira/browse/YARN-1974 > Project: Hadoop YARN > Issue Type: Improvement > Components: applications/distributed-shell >Affects Versions: 2.7.0 >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: YARN-1974.1.patch, YARN-1974.patch > > > It's very useful to execute a script on a specific set of machines for both > testing and maintenance purpose. > The args "--nodes" and "--relax_locality" are added to DistributedShell. > Together with an unit test using miniCluster. > It's also tested on our real cluster with Fair scheduler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038723#comment-15038723 ] Wangda Tan commented on YARN-4225: -- Thanks [~jlowe], I can understand the issue now. I'm OK with both approach - existing one in latest patch or simply return false if there's no such field in proto. > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch, > YARN-4225.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4304) AM max resource configuration per partition to be displayed/updated correctly in UI and in various partition related metrics
[ https://issues.apache.org/jira/browse/YARN-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038703#comment-15038703 ] Wangda Tan commented on YARN-4304: -- Hi [~sunilg], Took a look at REST API implementation of latest patch, some comments: By design, PartitionResourcesInfo/ResourcesInfo should be used by user/queue/app, so we need to make fields and usage generic to these components. - amResourceLimit is meaningful to all components. App doesn't use that field for now, but we can keep it and set it to infinite. - userAMResourceLimit is not meaningful to queue/app, and it's overlap to user.resourcesInfo.amResourceLimit, I suggest to remove it and we can use amResourceLimit of first user of queues to show on UI. Another reason is in the future we could have different users has different amResourceLimits. And also ResourcesInfo is RESTful mapping to ResourceUsage, so necessary changes need to be added to ResourceUsage (Maybe rename ResourceUsage to ResourcesInformation?) as well. Renaming could be done in a separated JIRA, but I suggest to change ResourceUsage implementation at the same JIRA. If you agree with above, ResourcesInfo's constructor shouldn't relate to LeafQueue and considerAMUsage, it should simply copy fields from ResourceUsage. > AM max resource configuration per partition to be displayed/updated correctly > in UI and in various partition related metrics > > > Key: YARN-4304 > URL: https://issues.apache.org/jira/browse/YARN-4304 > Project: Hadoop YARN > Issue Type: Sub-task > Components: webapp >Affects Versions: 2.7.1 >Reporter: Sunil G >Assignee: Sunil G > Attachments: 0001-YARN-4304.patch, 0002-YARN-4304.patch, > 0003-YARN-4304.patch, 0004-YARN-4304.patch, REST_and_UI.zip > > > As we are supporting per-partition level max AM resource percentage > configuration, UI and various metrics also need to display correct > configurations related to same. > For eg: Current UI still shows am-resource percentage per queue level. This > is to be updated correctly when label config is used. > - Display max-am-percentage per-partition in Scheduler UI (label also) and in > ClusterMetrics page > - Update queue/partition related metrics w.r.t per-partition > am-resource-percentage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4408) NodeManager still reports negative running containers
[ https://issues.apache.org/jira/browse/YARN-4408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038649#comment-15038649 ] Robert Kanter commented on YARN-4408: - Thanks [~djp]! > NodeManager still reports negative running containers > - > > Key: YARN-4408 > URL: https://issues.apache.org/jira/browse/YARN-4408 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.4.0 >Reporter: Robert Kanter >Assignee: Robert Kanter > Fix For: 2.8.0 > > Attachments: YARN-4408.001.patch, YARN-4408.002.patch, > YARN-4408.003.patch > > > YARN-1697 fixed a problem where the NodeManager metrics could report a > negative number of running containers. However, it missed a rare case where > this can still happen. > YARN-1697 added a flag to indicate if the container was actually launched > ({{LOCALIZED}} to {{RUNNING}}) or not ({{LOCALIZED}} to {{KILLING}}), which > is then checked when transitioning from {{CONTAINER_CLEANEDUP_AFTER_KILL}} to > {{DONE}} and {{EXITED_WITH_FAILURE}} to {{DONE}} to only decrement the gauge > if we actually ran the container and incremented the gauge . However, this > flag is not checked while transitioning from {{EXITED_WITH_SUCCESS}} to > {{DONE}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3840) Resource Manager web ui issue when sorting application by id (with application having id > 9999)
[ https://issues.apache.org/jira/browse/YARN-3840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038582#comment-15038582 ] Hudson commented on YARN-3840: -- FAILURE: Integrated in Hadoop-trunk-Commit #8914 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8914/]) YARN-3840. Resource Manager web ui issue when sorting application by id (jianhe: rev 9f77ccad735f4843ce2c38355de9f434838d4507) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppAttemptPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllApplicationsPage.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TasksPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/AllContainersPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/TestAHSWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/main/java/org/apache/hadoop/yarn/server/applicationhistoryservice/webapp/AppPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/webapp/view/JQueryUI.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/dt-sorting/natural.js * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/WebPageUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java > Resource Manager web ui issue when sorting application by id (with > application having id > ) > > > Key: YARN-3840 > URL: https://issues.apache.org/jira/browse/YARN-3840 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.7.0 >Reporter: LINTE >Assignee: Varun Saxena > Fix For: 2.8.0, 2.7.3 > > Attachments: RMApps.png, RMApps_Sorted.png, YARN-3840-1.patch, > YARN-3840-2.patch, YARN-3840-3.patch, YARN-3840-4.patch, YARN-3840-5.patch, > YARN-3840-6.patch, YARN-3840.reopened.001.patch, yarn-3840-7.patch > > > On the WEBUI, the global main view page : > http://resourcemanager:8088/cluster/apps doesn't display applications over > . > With command line it works (# yarn application -list). > Regards, > Alexandre -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-4238: --- Attachment: YARN-4238-feature-YARN-2928.03.patch > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.02.patch, YARN-4238-feature-YARN-2928.03.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4238) createdTime and modifiedTime is not reported while publishing entities to ATSv2
[ https://issues.apache.org/jira/browse/YARN-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038566#comment-15038566 ] Varun Saxena commented on YARN-4238: [~sjlee0], sorry had missed your comment. Will fix checkstyle issues and update a patch shortly . bq. I think it is reasonable to say that the clients are required to set creation time and modification time, or they will not be present in the data and things like sort will not work correctly on those records. What do you think? I agree. Client sending created time is a reasonable assumption to make. We can mention this explicitly as well when we document ATSv2. > createdTime and modifiedTime is not reported while publishing entities to > ATSv2 > --- > > Key: YARN-4238 > URL: https://issues.apache.org/jira/browse/YARN-4238 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: YARN-2928 >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: yarn-2928-1st-milestone > Attachments: YARN-4238-YARN-2928.01.patch, > YARN-4238-feature-YARN-2928.02.patch > > > While publishing entities from RM and elsewhere we are not sending created > time. For instance, created time in TimelineServiceV2Publisher class and for > other entities in other such similar classes is not updated. We can easily > update created time when sending application created event. Likewise for > modification time on every write. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038560#comment-15038560 ] Wangda Tan commented on YARN-4416: -- Thanks for reporting this issue, [~Naganarasimha]. Looked at the code, all methods used by org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue#toString don't need to be synchronized: - queueCapacity, resource-usage has their own read/write lock. - numContainers is volatile. - read/write lock could be added to OrderingPolicy. Read operations don't need synchronized. So getNumApplications doesn't need synchronized. > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4002) make ResourceTrackerService.nodeHeartbeat more concurrent
[ https://issues.apache.org/jira/browse/YARN-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038392#comment-15038392 ] Brook Zhou commented on YARN-4002: -- If this is currently not being worked on, I will assign it to me. > make ResourceTrackerService.nodeHeartbeat more concurrent > - > > Key: YARN-4002 > URL: https://issues.apache.org/jira/browse/YARN-4002 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Critical > Attachments: YARN-4002-v0.patch > > > We have multiple RPC threads to handle NodeHeartbeatRequest from NMs. By > design the method ResourceTrackerService.nodeHeartbeat should be concurrent > enough to scale for large clusters. > But we have a "BIG" lock in NodesListManager.isValidNode which I think it's > unnecessary. > First, the fields "includes" and "excludes" of HostsFileReader are only > updated on "refresh nodes". All RPC threads handling node heartbeats are > only readers. So RWLock could be used to alow concurrent access by RPC > threads. > Second, since he fields "includes" and "excludes" of HostsFileReader are > always updated by "reference assignment", which is atomic in Java, the reader > side lock could just be skipped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3946) Allow fetching exact reason as to why a submitted app is in ACCEPTED state in CS
[ https://issues.apache.org/jira/browse/YARN-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038381#comment-15038381 ] Naganarasimha G R commented on YARN-3946: - Hi [~wangda], Some of the test failures seems to be related to the patch also would merge {{checkAndUpdateAMContainerDiagnostics}} and {{updateAMContainerDiagnostics}} with additional parameter rather than another method. Will upload a new patch at the earliest. > Allow fetching exact reason as to why a submitted app is in ACCEPTED state in > CS > > > Key: YARN-3946 > URL: https://issues.apache.org/jira/browse/YARN-3946 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.6.0 >Reporter: Sumit Nigam >Assignee: Naganarasimha G R > Attachments: 3946WebImages.zip, YARN-3946.v1.001.patch, > YARN-3946.v1.002.patch, YARN-3946.v1.003.Images.zip, YARN-3946.v1.003.patch, > YARN-3946.v1.004.patch, YARN-3946.v1.005.patch > > > Currently there is no direct way to get the exact reason as to why a > submitted app is still in ACCEPTED state. It should be possible to know > through RM REST API as to what aspect is not being met - say, queue limits > being reached, or core/ memory requirement not being met, or AM limit being > reached, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
[ https://issues.apache.org/jira/browse/YARN-4416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4416: Attachment: deadlock.log attaching the stack trace for deadlock > Deadlock due to synchronised get Methods in AbstractCSQueue > --- > > Key: YARN-4416 > URL: https://issues.apache.org/jira/browse/YARN-4416 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.1 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: deadlock.log > > > While debugging in eclipse came across a scenario where in i had to get to > know the name of the queue but every time i tried to see the queue it was > getting hung. On seeing the stack realized there was a deadlock but on > analysis found out that it was only due to *queue.toString()* during > debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. > Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized > and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4416) Deadlock due to synchronised get Methods in AbstractCSQueue
Naganarasimha G R created YARN-4416: --- Summary: Deadlock due to synchronised get Methods in AbstractCSQueue Key: YARN-4416 URL: https://issues.apache.org/jira/browse/YARN-4416 Project: Hadoop YARN Issue Type: Bug Components: capacity scheduler, resourcemanager Affects Versions: 2.7.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Priority: Minor While debugging in eclipse came across a scenario where in i had to get to know the name of the queue but every time i tried to see the queue it was getting hung. On seeing the stack realized there was a deadlock but on analysis found out that it was only due to *queue.toString()* during debugging as {{AbstractCSQueue.getAbsoluteUsedCapacity}} was synchronized. Still i feel {{AbstractCSQueue}}'s getter methods need not be synchronized and better be handled through read and write locks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038332#comment-15038332 ] Naganarasimha G R commented on YARN-4415: - As per offline discussion with [~wangda] he had mentioned that it was done with intent that the default max capacity of a partition is set to zero to avoid configuring the queue. IMHO i feel its much easier if we assume max capacity is 100% and calculate abs max based on its parent queue's max cap for following reasons # It will have the same behavior as that of default partition hence less confusion # May be my understanding is wrong but i feel its easier to add new partitions without touching the CS.xml as we can set the accessible nodelabels to * and assume 100% as the max capacity and 0% as guranteed capacity. And also we need to update the documentation with the default values > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png, screenshot-1.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4415: Attachment: screenshot-1.png > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png, screenshot-1.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover
[ https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038294#comment-15038294 ] Jason Lowe commented on YARN-4392: -- +1 for the latest patch, if we go with re-sending of events upon recovery. I think re-sending of events is "safer" assuming the redundant events are handled properly. That way if we missed an event we will fill that gap upon recovery. There is the concern of extra load it generates on the RM and ATS during recovery. Note that we probably will miss ATS events upon recovery in some scenarios if we don't re-send since ATS event posting is async and state store updating are async. There's a race where we could update the state store and crash before the ATS event is sent. > ApplicationCreatedEvent event time resets after RM restart/failover > --- > > Key: YARN-4392 > URL: https://issues.apache.org/jira/browse/YARN-4392 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Xuan Gong >Assignee: Naganarasimha G R >Priority: Critical > Attachments: YARN-4392-2015-11-24.patch, YARN-4392.1.patch, > YARN-4392.2.patch > > > {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - > Finished time 1437453994768 is ahead of started time 1440308399674 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437454008244 is ahead of started time 1440308399676 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444305171 is ahead of started time 1440308399653 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444293115 is ahead of started time 1440308399647 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444379645 is ahead of started time 1440308399656 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444361234 is ahead of started time 1440308399655 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444342029 is ahead of started time 1440308399654 > 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444323447 is ahead of started time 1440308399654 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143730006 is ahead of started time 1440308399660 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143715698 is ahead of started time 1440308399659 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 143719060 is ahead of started time 1440308399658 > 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished > time 1437444393931 is ahead of started time 1440308399657 > {code} . > From ATS logs, we would see a large amount of 'stale alerts' messages > periodically -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4386) refreshNodesGracefully() looks at active RMNode list for recommissioning decommissioned nodes
[ https://issues.apache.org/jira/browse/YARN-4386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038283#comment-15038283 ] Kuhu Shukla commented on YARN-4386: --- [~sunilg], [~djp]. Request for comments on how to test this since even during the transition, the RMNode is removed from active list first and then put in the inactive RMNode list. Unless there are 2 refreshNodes done in parallel such that the first deactivateNodeTransition has not finished and the other refreshNodes is also trying to do the same transition, only one of them would succeed and this would not be a race (?). Let me know if that makes sense. > refreshNodesGracefully() looks at active RMNode list for recommissioning > decommissioned nodes > - > > Key: YARN-4386 > URL: https://issues.apache.org/jira/browse/YARN-4386 > Project: Hadoop YARN > Issue Type: Bug > Components: graceful >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: YARN-4386-v1.patch > > > In refreshNodesGracefully(), during recommissioning, the entryset from > getRMNodes() which has only active nodes (RUNNING, DECOMMISSIONING etc.) is > used for checking 'decommissioned' nodes which are present in > getInactiveRMNodes() map alone. > {code} > for (Entry entry:rmContext.getRMNodes().entrySet()) { > . > // Recommissioning the nodes > if (entry.getValue().getState() == NodeState.DECOMMISSIONING > || entry.getValue().getState() == NodeState.DECOMMISSIONED) { > this.rmContext.getDispatcher().getEventHandler() > .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038282#comment-15038282 ] Wangda Tan commented on YARN-4405: -- Thanks [~sunilg]. Addressed #2/#3. For #1, I was using {code} fs.create(newTmpPath, true); {code} So old tmp file will be always overwritten by the new one. Attaching ver.4 patch. > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4405) Support node label store in non-appendable file system
[ https://issues.apache.org/jira/browse/YARN-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-4405: - Attachment: YARN-4405.4.patch > Support node label store in non-appendable file system > -- > > Key: YARN-4405 > URL: https://issues.apache.org/jira/browse/YARN-4405 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, client, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-4405.1.patch, YARN-4405.2.patch, YARN-4405.3.patch, > YARN-4405.4.patch > > > Existing node label file system store implementation uses append to write > edit logs. However, some file system doesn't support append, we need add an > implementation to support such non-appendable file systems as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038269#comment-15038269 ] Kuhu Shukla commented on YARN-4413: --- The current patch for YARN-4386* > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038265#comment-15038265 ] Kuhu Shukla commented on YARN-4413: --- YARN-4386 tracks the RECOMMISSION check. The current patch does not have a test since its an invalid check. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated YARN-4413: --- Attachment: YARN-4413.001.patch Here's the approach I propose. I should note that I noticed that the graceful decommission code allows an illegal state transition: {code} if (entry.getValue().getState() == NodeState.DECOMMISSIONING || entry.getValue().getState() == NodeState.DECOMMISSIONED) { this.rmContext.getDispatcher().getEventHandler() .handle(new RMNodeEvent(nodeId, RMNodeEventType.RECOMMISSION)); } {code} DECOMMISSIONED -> RECOMMISSION is not allowed. This issue is coincidentally fixed by this patch. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > Attachments: YARN-4413.001.patch > > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
[ https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-4415: Attachment: App info with diagnostics info.png > Scheduler Web Ui shows max capacity for the queue is 100% but when we submit > application doesnt get assigned > > > Key: YARN-4415 > URL: https://issues.apache.org/jira/browse/YARN-4415 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacity scheduler, resourcemanager >Affects Versions: 2.7.2 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R > Attachments: App info with diagnostics info.png > > > Steps to reproduce the issue : > Scenario 1: > # Configure a queue(default) with accessible node labels as * > # create a exclusive partition *xxx* and map a NM to it > # ensure no capacities are configured for default for label xxx > # start an RM app with queue as default and label as xxx > # application is stuck but scheduler ui shows 100% as max capacity for that > queue > Scenario 2: > # create a nonexclusive partition *sharedPartition* and map a NM to it > # ensure no capacities are configured for default queue > # start an RM app with queue as *default* and label as *sharedPartition* > # application is stuck but scheduler ui shows 100% as max capacity for that > queue for *sharedPartition* > For both issues cause is the same default max capacity and abs max capacity > is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4415) Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned
Naganarasimha G R created YARN-4415: --- Summary: Scheduler Web Ui shows max capacity for the queue is 100% but when we submit application doesnt get assigned Key: YARN-4415 URL: https://issues.apache.org/jira/browse/YARN-4415 Project: Hadoop YARN Issue Type: Sub-task Components: capacity scheduler, resourcemanager Affects Versions: 2.7.2 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Steps to reproduce the issue : Scenario 1: # Configure a queue(default) with accessible node labels as * # create a exclusive partition *xxx* and map a NM to it # ensure no capacities are configured for default for label xxx # start an RM app with queue as default and label as xxx # application is stuck but scheduler ui shows 100% as max capacity for that queue Scenario 2: # create a nonexclusive partition *sharedPartition* and map a NM to it # ensure no capacities are configured for default queue # start an RM app with queue as *default* and label as *sharedPartition* # application is stuck but scheduler ui shows 100% as max capacity for that queue for *sharedPartition* For both issues cause is the same default max capacity and abs max capacity is set to Zero % -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4309) Add debug information to application logs when a container fails
[ https://issues.apache.org/jira/browse/YARN-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038231#comment-15038231 ] Sidharta Seethana commented on YARN-4309: - Could you please edit the comment mentioned above to make it a bit clearer? Maybe include a note, both in yarn-default.xml 's description of the config flag as well as code that symlinks could be followed outside the current directory? Also, there seem to be a few spurious empty lines introduced in DockerContainerExecutor. Apart from this, the latest patch seems good to me. > Add debug information to application logs when a container fails > > > Key: YARN-4309 > URL: https://issues.apache.org/jira/browse/YARN-4309 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Attachments: YARN-4309.001.patch, YARN-4309.002.patch, > YARN-4309.003.patch, YARN-4309.004.patch, YARN-4309.005.patch > > > Sometimes when a container fails, it can be pretty hard to figure out why it > failed. > My proposal is that if a container fails, we collect information about the > container local dir and dump it into the container log dir. Ideally, I'd like > to tar up the directory entirely, but I'm not sure of the security and space > implications of such a approach. At the very least, we can list all the files > in the container local dir, and dump the contents of launch_container.sh(into > the container log dir). > When log aggregation occurs, all this information will automatically get > collected and make debugging such failures much easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4225) Add preemption status to yarn queue -status for capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-4225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038195#comment-15038195 ] Jason Lowe commented on YARN-4225: -- With deployments of multiple clusters it's easy to get into situations where newer clients end up talking to older clusters. Clusters are rarely upgraded at the same time, and remote access of one cluster for a job's input is not rare at least in our setups. That's a perfect example of a newer client talking to an older server. It may not be officially supported, but it's not going to be a rare occurrence (for HDFS, at least). For this specific case I think the need for the feature would be rare, hence my hedging about whether we really need it. It's not that the new client completely breaks talking to the older server, it just would be capable of returning misleading information about the preemption status. I'm OK if we decide this scenario isn't worth supporting. (It's not hard to do so, just tedious and a bit messy with the API.) But in general there will be cases where breaking compatibility between a newer client and an older server is going to be problematic even if it isn't officially supported because of the multiple cluster scenarios. > Add preemption status to yarn queue -status for capacity scheduler > -- > > Key: YARN-4225 > URL: https://issues.apache.org/jira/browse/YARN-4225 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, yarn >Affects Versions: 2.7.1 >Reporter: Eric Payne >Assignee: Eric Payne >Priority: Minor > Attachments: YARN-4225.001.patch, YARN-4225.002.patch, > YARN-4225.003.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3816) [Aggregation] App-level aggregation and accumulation for YARN system metrics
[ https://issues.apache.org/jira/browse/YARN-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038169#comment-15038169 ] Hadoop QA commented on YARN-3816: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 4 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 11m 24s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 12s {color} | {color:green} feature-YARN-2928 passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 27s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 53s {color} | {color:green} feature-YARN-2928 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 21s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in feature-YARN-2928 has 3 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 31s {color} | {color:red} hadoop-yarn-common in feature-YARN-2928 failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 10s {color} | {color:green} feature-YARN-2928 passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 28s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 28s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 30s {color} | {color:red} Patch generated 18 new checkstyle issues in hadoop-yarn-project/hadoop-yarn (total was 362, now 367). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s {color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 15s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 3m 34s {color} | {color:red} hadoop-yarn-project_hadoop-yarn_hadoop-yarn-api-jdk1.8.0_66 with JDK v1.8.0_66 generated 2 new issues (was 100, now 100). {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 33s {color} | {color:red} hadoop-yarn-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 4m 14s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s {color} | {color:green} hadoop-yarn-api in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 5s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 9m 20s {color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 51s {color}
[jira] [Commented] (YARN-4413) Nodes in the includes list should not be listed as decommissioned in the UI
[ https://issues.apache.org/jira/browse/YARN-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038135#comment-15038135 ] Daniel Templeton commented on YARN-4413: bq. But a restart will help here to clear the metrics. True, but it will also cause an outage, which comes with its own potential impact. bq. So I feel we could look both lists upon refresh and remove/add nodes based on the entries in both files and from memory. Agreed. I'll past a patch with my general approach shortly. > Nodes in the includes list should not be listed as decommissioned in the UI > --- > > Key: YARN-4413 > URL: https://issues.apache.org/jira/browse/YARN-4413 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Daniel Templeton >Assignee: Daniel Templeton > > If I decommission a node and then move it from the excludes list back to the > includes list, but I don't restart the node, the node will still be listed by > the web UI as decomissioned until either the NM or RM is restarted. Ideally, > removing the node from the excludes list and putting it back into the > includes list should cause the node to be reported as shutdown instead. > CC [~kshukla] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4414) Nodemanager connection errors are retried at multiple levels
[ https://issues.apache.org/jira/browse/YARN-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038107#comment-15038107 ] Jason Lowe commented on YARN-4414: -- I noticed that HA proxies for the namenode and resourcemanager explicitly disable the connection retries in the RPC layer by default since it knows the HA proxy will do the retries. I think the same should apply for nodemanager proxies, since we're seeing even connection timeouts retried too often in the RPC layer given a container allocation is worthless after 10 minutes by default. By disabling retries in the RPC layer, we can add ConnectTimeoutException back to the list of exceptions retried at the NM proxy layer and simply retry all appropriate exceptions at the NM proxy layer. > Nodemanager connection errors are retried at multiple levels > > > Key: YARN-4414 > URL: https://issues.apache.org/jira/browse/YARN-4414 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.7.1, 2.6.2 >Reporter: Jason Lowe > > This is related to YARN-3238. Ran into more scenarios where connection > errors are being retried at multiple levels, like NoRouteToHostException. > The fix for YARN-3238 was too specific, and I think we need a more general > solution to catch a wider array of connection errors that can occur to avoid > retrying them both at the RPC layer and at the NM proxy layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-4414) Nodemanager connection errors are retried at multiple levels
Jason Lowe created YARN-4414: Summary: Nodemanager connection errors are retried at multiple levels Key: YARN-4414 URL: https://issues.apache.org/jira/browse/YARN-4414 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.2, 2.7.1 Reporter: Jason Lowe This is related to YARN-3238. Ran into more scenarios where connection errors are being retried at multiple levels, like NoRouteToHostException. The fix for YARN-3238 was too specific, and I think we need a more general solution to catch a wider array of connection errors that can occur to avoid retrying them both at the RPC layer and at the NM proxy layer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)