[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status
[ https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495798#comment-14495798 ] Hadoop QA commented on YARN-1402: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725487/YARN-1402.2.patch against trunk revision fddd552. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7344//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7344//console This message is automatically generated. > Related Web UI, CLI changes on exposing client API to check log aggregation > status > -- > > Key: YARN-1402 > URL: https://issues.apache.org/jira/browse/YARN-1402 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1402.1.patch, YARN-1402.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status
[ https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495809#comment-14495809 ] Xuan Gong commented on YARN-1402: - -1 core tests is not related > Related Web UI, CLI changes on exposing client API to check log aggregation > status > -- > > Key: YARN-1402 > URL: https://issues.apache.org/jira/browse/YARN-1402 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Xuan Gong >Assignee: Xuan Gong > Attachments: YARN-1402.1.patch, YARN-1402.2.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryu Kobayashi updated YARN-3404: Attachment: YARN-3404.4.patch [~jianhe] I see. I have created a patch that uses the {{WebAppUtils.getResolvedRMWebAppURLWithScheme}}. > View the queue name to YARN Application page > > > Key: YARN-3404 > URL: https://issues.apache.org/jira/browse/YARN-3404 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, > YARN-3404.4.patch, screenshot.png > > > It want to display the name of the queue that is used to YARN Application > page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495846#comment-14495846 ] Hadoop QA commented on YARN-3404: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725514/YARN-3404.4.patch against trunk revision fddd552. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7345//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7345//console This message is automatically generated. > View the queue name to YARN Application page > > > Key: YARN-3404 > URL: https://issues.apache.org/jira/browse/YARN-3404 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, > YARN-3404.4.patch, screenshot.png > > > It want to display the name of the queue that is used to YARN Application > page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496014#comment-14496014 ] Hudson commented on YARN-3436: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #164 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/164/]) YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md * hadoop-yarn-project/CHANGES.txt > Fix URIs in documention of YARN web service REST APIs > - > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496016#comment-14496016 ] Hudson commented on YARN-3361: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #164 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/164/]) YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java > CapacityScheduler side change
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496017#comment-14496017 ] Hudson commented on YARN-3266: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #164 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/164/]) YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.8.0 > > Attachments: YARN-3266.01.patch, YARN-3266.02.patch, > YARN-3266.03.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496024#comment-14496024 ] Hudson commented on YARN-3361: -- FAILURE: Integrated in Hadoop-Yarn-trunk #898 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/898/]) YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java > CapacityScheduler side changes to support
[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496023#comment-14496023 ] Hudson commented on YARN-3436: -- FAILURE: Integrated in Hadoop-Yarn-trunk #898 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/898/]) YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md * hadoop-yarn-project/CHANGES.txt > Fix URIs in documention of YARN web service REST APIs > - > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496025#comment-14496025 ] Hudson commented on YARN-3266: -- FAILURE: Integrated in Hadoop-Yarn-trunk #898 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/898/]) YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.8.0 > > Attachments: YARN-3266.01.patch, YARN-3266.02.patch, > YARN-3266.03.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496069#comment-14496069 ] Hudson commented on YARN-3361: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2096 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2096/]) YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java > CapacityScheduler side changes to suppo
[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496068#comment-14496068 ] Hudson commented on YARN-3436: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2096 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2096/]) YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md * hadoop-yarn-project/CHANGES.txt > Fix URIs in documention of YARN web service REST APIs > - > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496070#comment-14496070 ] Hudson commented on YARN-3266: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2096 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2096/]) YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.8.0 > > Attachments: YARN-3266.01.patch, YARN-3266.02.patch, > YARN-3266.03.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496076#comment-14496076 ] Hudson commented on YARN-3361: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #155 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/155/]) YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java > CapacityScheduler side change
[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496075#comment-14496075 ] Hudson commented on YARN-3436: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #155 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/155/]) YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md * hadoop-yarn-project/CHANGES.txt > Fix URIs in documention of YARN web service REST APIs > - > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496077#comment-14496077 ] Hudson commented on YARN-3266: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #155 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/155/]) YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.8.0 > > Attachments: YARN-3266.01.patch, YARN-3266.02.patch, > YARN-3266.03.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails
[ https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496099#comment-14496099 ] Hadoop QA commented on YARN-3476: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12724974/0001-YARN-3476.patch against trunk revision fddd552. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7346//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7346//console This message is automatically generated. > Nodemanager can fail to delete local logs if log aggregation fails > -- > > Key: YARN-3476 > URL: https://issues.apache.org/jira/browse/YARN-3476 > Project: Hadoop YARN > Issue Type: Bug > Components: log-aggregation, nodemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Rohith > Attachments: 0001-YARN-3476.patch > > > If log aggregation encounters an error trying to upload the file then the > underlying TFile can throw an illegalstateexception which will bubble up > through the top of the thread and prevent the application logs from being > deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running
[ https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496103#comment-14496103 ] Rohith commented on YARN-2268: -- I propose the following way to handle disallow state store when RM is running. For both HA(Active and Standby) and Non-HA, it is possible to get RM state using REST API getClusterInfo('ws/v1/cluster/info'). This can be make use for identifying RM state. This is independent of any state store implementaions. In HA, ACTIVE state is checked with all the the RM-Id's in a sequential manner. If no ACTIVE state RM is found then format the store otherwise throw an exception *ActiveResourceManagerRunningException*. Cons : Formatting state store when HA is enabled is *Best Effort* basis, there would be scenario where RM state can be chagned after one of the RM is checked. Kindly share your thoughts on this approach.. > Disallow formatting the RMStateStore when there is an RM running > > > Key: YARN-2268 > URL: https://issues.apache.org/jira/browse/YARN-2268 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Karthik Kambatla >Assignee: Rohith > > YARN-2131 adds a way to format the RMStateStore. However, it can be a problem > if we format the store while an RM is actively using it. It would be nice to > fail the format if there is an RM running and using this store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3477) TimelineClientImpl swallows root cause of retry failures
[ https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-3477: - Target Version/s: 2.7.1 Affects Version/s: (was: 3.0.0) 2.7.0 > TimelineClientImpl swallows root cause of retry failures > > > Key: YARN-3477 > URL: https://issues.apache.org/jira/browse/YARN-3477 > Project: Hadoop YARN > Issue Type: Bug > Components: timelineserver >Affects Versions: 2.7.0 >Reporter: Steve Loughran >Assignee: Steve Loughran > > If timeline client fails more than the retry count, the original exception is > not thrown. Instead some runtime exception is raised saying "retries run out" > # the failing exception should be rethrown, ideally via > NetUtils.wrapException to include URL of the failing endpoing > # Otherwise, the raised RTE should (a) state that URL and (b) set the > original fault as the inner cause -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
Jason Lowe created YARN-3489: Summary: RMServerUtils.validateResourceRequests should only obtain queue info once Key: YARN-3489 URL: https://issues.apache.org/jira/browse/YARN-3489 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.6.0 Reporter: Jason Lowe Since the label support was added we now get the queue info for each request being validated in SchedulerUtils.validateResourceRequest. If validateResourceRequests needs to validate a lot of requests at a time (e.g.: large cluster with lots of varied locality in the requests) then it will get the queue info for each request. Since we build the queue info this generates a lot of unnecessary garbage, as the queue isn't changing between requests. We should grab the queue info once and pass it down rather than building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once
[ https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena reassigned YARN-3489: -- Assignee: Varun Saxena > RMServerUtils.validateResourceRequests should only obtain queue info once > - > > Key: YARN-3489 > URL: https://issues.apache.org/jira/browse/YARN-3489 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Jason Lowe >Assignee: Varun Saxena > > Since the label support was added we now get the queue info for each request > being validated in SchedulerUtils.validateResourceRequest. If > validateResourceRequests needs to validate a lot of requests at a time (e.g.: > large cluster with lots of varied locality in the requests) then it will get > the queue info for each request. Since we build the queue info this > generates a lot of unnecessary garbage, as the queue isn't changing between > requests. We should grab the queue info once and pass it down rather than > building it again for each request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3471) Fix timeline client retry
[ https://issues.apache.org/jira/browse/YARN-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated YARN-3471: - Affects Version/s: 2.8.0 > Fix timeline client retry > - > > Key: YARN-3471 > URL: https://issues.apache.org/jira/browse/YARN-3471 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Affects Versions: 2.8.0 >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3471.1.patch, YARN-3471.2.patch > > > I found that the client retry has some problems: > 1. The new put methods will retry on all exception, but they should only do > it upon ConnectException. > 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496239#comment-14496239 ] Thomas Graves commented on YARN-3434: - So I had considered putting it in the ResourceLimits but ResourceLimits seems to be more of a queue level thing to me (not a user level). For instance parentQueue passes this into leafQueue. ParentQueue cares nothing about user limits. If you stored it there you would either need to track the user it was for or track for all users. ResourceLimits get updated when nodes are added and removed. We don't need to compute a particular user limit when that happens. So it would then be out of date or we change to update it when that happens, but that to me is fairly large change and not really needed. The user limit calculation are lower down and recomputed per user, per application, per current request regularly and putting this into the global based on how being calculated and used didn't make sense to me. All you would be using it for is passing it down to assignContainer and then it would be out of date. If someone else started looking at that value assuming it was up to date then it would be wrong (unless of course we started updating it as stated above). But it would only be for a single user, not all users unless again we changed to calculate for every user whenever something changed. That seems a bit excessive. You are correct that needToUnreserve could go away. I started out on 2.6 which didn't have our changes and I could have removed it when I added in amountNeededUnreserve. If we were to store it in the global ResourceLimit then yes the entire LimitsInfo can go away including shouldContinue as you would fall back to use the boolean return from each function. But again based on my above comments I'm not sure ResourceLimit is the correct place to put this. I just noticed that we are already keeping the userLimit in the User class, that would be another option. But again I think we need to make it clear about what it is. This particular check is done per application, per user based on the current requested Resource. The value stored that wouldn't necessarily apply to all the users applications since the resource request size could be different. thoughts or is there something I'm missing about ResourceLimits? > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496305#comment-14496305 ] Hudson commented on YARN-3361: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/]) YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java > CapacityScheduler s
[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496304#comment-14496304 ] Hudson commented on YARN-3436: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/]) YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md * hadoop-yarn-project/CHANGES.txt > Fix URIs in documention of YARN web service REST APIs > - > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496306#comment-14496306 ] Hudson commented on YARN-3266: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/]) YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/CHANGES.txt > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.8.0 > > Attachments: YARN-3266.01.patch, YARN-3266.02.patch, > YARN-3266.03.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key
[ https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496385#comment-14496385 ] Hudson commented on YARN-3266: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/]) YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java > RMContext inactiveNodes should have NodeId as map key > - > > Key: YARN-3266 > URL: https://issues.apache.org/jira/browse/YARN-3266 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Fix For: 2.8.0 > > Attachments: YARN-3266.01.patch, YARN-3266.02.patch, > YARN-3266.03.patch > > > Under the default NM port configuration, which is 0, we have observed in the > current version, "lost nodes" count is greater than the length of the lost > node list. This will happen when we consecutively restart the same NM twice: > * NM started at port 10001 > * NM restarted at port 10002 > * NM restarted at port 10003 > * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} has 1 element > * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; > {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, > {{inactiveNodes}} still has 1 element > Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), > {{inactiveNodes}} should be of type {{ConcurrentMap}}. If > this will break the current API, then the key string should include the NM's > port as well. > Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs
[ https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496383#comment-14496383 ] Hudson commented on YARN-3436: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/]) YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md * hadoop-yarn-project/CHANGES.txt > Fix URIs in documention of YARN web service REST APIs > - > > Key: YARN-3436 > URL: https://issues.apache.org/jira/browse/YARN-3436 > Project: Hadoop YARN > Issue Type: Bug > Components: documentation, resourcemanager >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3436.001.patch > > > /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html > {quote} > Response Examples > JSON response with single resource > HTTP Request: GET > http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001 > Response Status Line: HTTP/1.1 200 OK > {quote} > Url should be ws/v1/cluster/{color:red}apps{color} . > 2 examples on same page are wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels
[ https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496384#comment-14496384 ] Hudson commented on YARN-3361: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/]) YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java * hadoop-yarn-project/CHANGES.txt > CapacityScheduler side change
[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated YARN-3448: -- Attachment: YARN-3448.8.patch > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, > YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496490#comment-14496490 ] Zhijie Shen commented on YARN-3051: --- Hence, regardless the implementation detail, we logically use: 1. to identify entities that are generated on the same cluster. 2. to identify entities globally across clusters. In terms of compatibility, {{getTimelineEntity(entity type, entity id)}} can assume the cluster ID is either the default one or configured in yarn-site.xml. Does it sound good? > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496503#comment-14496503 ] Sangjin Lee commented on YARN-3051: --- Yep. That's perfect. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496509#comment-14496509 ] Varun Saxena commented on YARN-3051: As per the patch I am currently working on, if clusterid does not come in the query, it is taken from config. So thats consistent. Although I was assuming appid will be part of PK. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496511#comment-14496511 ] Sangjin Lee commented on YARN-3390: --- {quote} For putIfAbsent and remove, I don't use template method pattern, but let the subclass override the super class method and invoke it inside the override implementation, because I'm not sure if we will need pre process or post process, and if we only invoke the process when adding a new collector. If we're sure about template, I'm okay with the template pattern too. {quote} I'm fine with either approach. The main reason I thought of that is I wanted to be clear that the base implementation of putIfAbsent() and remove() is mandatory (i.e. not optional). Since we control all of it (base and subclasses), it might not be such a big deal either way. > Reuse TimelineCollectorManager for RM > - > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3390.1.patch > > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy
[ https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496519#comment-14496519 ] Hudson commented on YARN-3318: -- FAILURE: Integrated in Hadoop-trunk-Commit #7588 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7588/]) YARN-3318. Create Initial OrderingPolicy Framework and FifoOrderingPolicy. (Craig Welch via wangda) (wangda: rev 5004e753322084e42dfda4be1d2db66677f86a1e) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/MockSchedulableEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/SchedulableEntity.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFifoOrderingPolicy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoComparator.java > Create Initial OrderingPolicy Framework and FifoOrderingPolicy > -- > > Key: YARN-3318 > URL: https://issues.apache.org/jira/browse/YARN-3318 > Project: Hadoop YARN > Issue Type: Sub-task > Components: scheduler >Reporter: Craig Welch >Assignee: Craig Welch > Fix For: 2.8.0 > > Attachments: YARN-3318.13.patch, YARN-3318.14.patch, > YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, > YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, > YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, > YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, > YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch, YARN-3318.61.patch > > > Create the initial framework required for using OrderingPolicies and an > initial FifoOrderingPolicy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities
[ https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496528#comment-14496528 ] Hadoop QA commented on YARN-3448: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725620/YARN-3448.8.patch against trunk revision fddd552. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 10 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7347//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7347//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7347//console This message is automatically generated. > Add Rolling Time To Lives Level DB Plugin Capabilities > -- > > Key: YARN-3448 > URL: https://issues.apache.org/jira/browse/YARN-3448 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles > Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, > YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch > > > For large applications, the majority of the time in LeveldbTimelineStore is > spent deleting old entities record at a time. An exclusive write lock is held > during the entire deletion phase which in practice can be hours. If we are to > relax some of the consistency constraints, other performance enhancing > techniques can be employed to maximize the throughput and minimize locking > time. > Split the 5 sections of the leveldb database (domain, owner, start time, > entity, index) into 5 separate databases. This allows each database to > maximize the read cache effectiveness based on the unique usage patterns of > each database. With 5 separate databases each lookup is much faster. This can > also help with I/O to have the entity and index databases on separate disks. > Rolling DBs for entity and index DBs. 99.9% of the data are in these two > sections 4:1 ration (index to entity) at least for tez. We replace DB record > removal with file system removal if we create a rolling set of databases that > age out and can be efficiently removed. To do this we must place a constraint > to always place an entity's events into it's correct rolling db instance > based on start time. This allows us to stitching the data back together while > reading and artificial paging. > Relax the synchronous writes constraints. If we are willing to accept losing > some records that we not flushed in the operating system during a crash, we > can use async writes that can be much faster. > Prefer Sequential writes. sequential writes can be several times faster than > random writes. Spend some small effort arranging the writes in such a way > that will trend towards sequential write performance over random write > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Saxena updated YARN-3051: --- Attachment: YARN-3051.wip.patch > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496534#comment-14496534 ] Varun Saxena commented on YARN-3051: Updated a WIP patch. Will update javadoc after everyone is on same page on the approach and API. Working on unit tests. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage
[ https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496538#comment-14496538 ] Junping Du commented on YARN-3411: -- Thanks [~vrushalic] for delivering the proposal and poc patch which is an excellent job! Some quick comments from walk through proposal: bq. Entity Table - primary key components-putting the UserID first helps to distribute writes across the regions in the hbase cluster. Pros:​ avoids single region hotspotting. Cons:​ connections would be open to several region servers during writes from per node ATS. Looks like we are try to get rid of region server hotspotting issues. I agree that this design could helps. However, this is still possible that specific user could submit much more applications than anyone else. In that case, the region hotspot issue will still appear. Isn't it? I think the more general way to solve this problem is making keys get salted with a prefix. Thoughts? bq. Entity Table - column families​-config needs to be stored as key value, not as a blob to enable efficient key based querying based on config param name. storing it in a separate column family helps to avoid scanning over config while reading metrics and vice versa +1. This leverage strength of columnar database. We should get rid of storing any default value for key. However, this sounds challengable if TimelineClient only has a configuration object. bq. Entity Table - metrics are written to with an hbase cell timestamp set to top of the minute or top of the 5 minute interval or whatever is decided. This helps in timeseries storage and retrieval in case of querying at the entity level. Can we also let TimelineCollector do some aggregation of metrics in a similar time interval rather than sending to HBase/Pheonix for every metrics when it received? This may help to lease some pressure to backend. bq. Flow by application id table I am still think we should figure out some way to store application attempts info. The typical usecase here is: for some reason (like: bug or hardware capability reason), some flow/application's AM could always get failed more times than other flows/applications. Keeping this info can help us to track these issues. Isn't it? bq. flow summary daily table (aggregation table managed by Phoenix) - could be triggered via coÂprocessor with each put in flow table or a cron run once per day to aggregate for yesterday (with catchup functionality in case of backlog etc) Do each put in flow table sounds a little expensive especially when putting activity is very frequently. May be we should do some batch mode here? In addition, I think we can leverage per node TimelineCollector to do some first level aggregation which can help to relieve workload in backend. > [Storage implementation] explore the native HBase write schema for storage > -- > > Key: YARN-3411 > URL: https://issues.apache.org/jira/browse/YARN-3411 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Vrushali C >Priority: Critical > Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt > > > There is work that's in progress to implement the storage based on a Phoenix > schema (YARN-3134). > In parallel, we would like to explore an implementation based on a native > HBase schema for the write path. Such a schema does not exclude using > Phoenix, especially for reads and offline queries. > Once we have basic implementations of both options, we could evaluate them in > terms of performance, scalability, usability, etc. and make a call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly
[ https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2605: Issue Type: Sub-task (was: Bug) Parent: YARN-149 > [RM HA] Rest api endpoints doing redirect incorrectly > - > > Key: YARN-2605 > URL: https://issues.apache.org/jira/browse/YARN-2605 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.4.0 >Reporter: bc Wong >Assignee: Anubhav Dhoot > Labels: newbie > > The standby RM's webui tries to do a redirect via meta-refresh. That is fine > for pages designed to be viewed by web browsers. But the API endpoints > shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd > suggest HTTP 303, or return a well-defined error message (json or xml) > stating that the standby status and a link to the active RM. > The standby RM is returning this today: > {noformat} > $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics > HTTP/1.1 200 OK > Cache-Control: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Expires: Thu, 25 Sep 2014 18:34:53 GMT > Date: Thu, 25 Sep 2014 18:34:53 GMT > Pragma: no-cache > Content-Type: text/plain; charset=UTF-8 > Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > Content-Length: 117 > Server: Jetty(6.1.26) > This is standby RM. Redirecting to the current active RM: > http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2696: - Attachment: YARN-2696.2.patch Attached ver.2 patch fixed findbugs warning and test failures (TestRMDelegationTokens is not related). I've thought about Jian's comment: bq. We can merge PartitionedQueueComparator and nonPartitionedQueueComparator into a single QueueComparator. After think about this, I think we cannot, NonPartitionedQueueComparator is stateless, and PartitionedQueueComparator is stateful, someone can modify "partitionToLookAt" for Partitioned.., but we should keep NonPartitionedQueueComparator only and always sort by default partition. > Queue sorting in CapacityScheduler should consider node label > - > > Key: YARN-2696 > URL: https://issues.apache.org/jira/browse/YARN-2696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2696.1.patch, YARN-2696.2.patch > > > In the past, when trying to allocate containers under a parent queue in > CapacityScheduler. The parent queue will choose child queues by the used > resource from smallest to largest. > Now we support node label in CapacityScheduler, we should also consider used > resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3490) Add an application decorator to ClientRMService
Jian Fang created YARN-3490: --- Summary: Add an application decorator to ClientRMService Key: YARN-3490 URL: https://issues.apache.org/jira/browse/YARN-3490 Project: Hadoop YARN Issue Type: New Feature Components: resourcemanager Reporter: Jian Fang Per the discussion on MAPREDUCE-6304, hadoop cloud service provider wants to hook in some logic to control the allocation of an application on the resource manager side because it is sometimes impractical to control the client side of a hadoop cluster in cloud. Hadoop service provider and hadoop users usually have different privileges, control, and access on a hadoop cluster in cloud. One good example is that application masters should not be allocated to spot instances on Amazon EC2. To achieve that, an application decorator could be provided to orchestrate the ApplicationSubmissionContext by specifying the AM label expression, for example. Hadoop could provide a dummy decorator that does nothing by default, but it should allow users to replace this decorator with their own decorators to meet their specific needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496617#comment-14496617 ] Wangda Tan commented on YARN-3434: -- [~tgraves], I think your concerns may not be a problem, ResourceLimits will be replaced (instead of updated) when node heartbeat. And ResourceLimits object itself is to decouple Parent and Child (e.g. ParentQueue to Children, LeafQueue to apps), Child doesn't need to understand how Parent compute limits, it only need to respect limits. For example, app doesn't need to understand how queue computing queue capacity/user-limit/continous-reservation-looking, it only need to know what's the "limit" considering all factors, so it can make decision to allocate/release-before-allocate/cannot-continue. The usage of ResourceLimits in my mind for user-limit case is: - ParentQueue compute/set limits - LeafQueue store limits (why store see 1.) - LeafQueue recompute/set user-limit when trying to do allocate for each app/priority - LeafQueue check user-limit as well as limits when trying to allocate/reserve container - The user-limit saved in ResourceLimits is only used in normal allocation/reservation path, if it's a reserved allocation, we will reset user-limit to un-limited. 1. Why store limits in LeafQueue instead of passing down? This is required by headroom computing, app's headroom is affected by queue's parent as well as sibling changes, we cannot update all app's headroom when that changes, but we need recompute headroom when app do heartbeat, so we have to store latest ResourceLimits in LeafQueue. See YARN-2008 for more information. I'm not sure if above can make you understand better about my suggestion. Please let me know your thoughts. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496644#comment-14496644 ] Wangda Tan commented on YARN-3434: -- bq. All you would be using it for is passing it down to assignContainer and then it would be out of date. If someone else started looking at that value assuming it was up to date then it would be wrong (unless of course we started updating it as stated above). But it would only be for a single user, not all users unless again we changed to calculate for every user whenever something changed. That seems a bit excessive. To clarify, ResourceLimits is the bridge between parent and child, parent will tell child "hey, this is the limit you can use", LeafQueue will do the same thing to app, ParentQueue doesn't compute/pass-down user-limit to LeafQueue at all, LeafQueue will do that and make sure it get updated for every allocation. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)
[ https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496648#comment-14496648 ] Jian Fang commented on YARN-2306: - Could someone please tell me which JIRA has fixed this bug in trunk? I am working on hadoop 2.6.0 branch and need to see if I need to fix this issue or not. Thanks in advance. > leak of reservation metrics (fair scheduler) > > > Key: YARN-2306 > URL: https://issues.apache.org/jira/browse/YARN-2306 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Reporter: Hong Zhiguo >Assignee: Hong Zhiguo >Priority: Minor > Attachments: YARN-2306-2.patch, YARN-2306.patch > > > This only applies to fair scheduler. Capacity scheduler is OK. > When appAttempt or node is removed, the metrics for > reservation(reservedContainers, reservedMB, reservedVCores) is not reduced > back. > These are important metrics for administrator. The wrong metrics confuses may > confuse them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
zhihai xu created YARN-3491: --- Summary: Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer). Key: YARN-3491 URL: https://issues.apache.org/jira/browse/YARN-3491 Project: Hadoop YARN Issue Type: Improvement Components: nodemanager Affects Versions: 2.7.0 Reporter: zhihai xu Assignee: zhihai xu Priority: Critical Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer). Currently FSDownload submission to the thread pool is done in PublicLocalizer#addResource which is running in Dispatcher thread and completed localization handling is done in PublicLocalizer#run which is running in PublicLocalizer thread. Because FSDownload submission to the thread pool at the following code is time consuming, the thread pool can't be fully utilized. Instead of doing public resource localization in parallel(multithreading), public resource localization is serialized most of the time. {code} synchronized (pending) { pending.put(queue.submit(new FSDownload(lfs, null, conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); } {code} Also there are two more benefits with this change: 1. The Dispatcher thread won't be blocked by above FSDownload submission. Dispatcher thread handles most of time critical events at Node manager. 2. don't need synchronization on HashMap (pending). Because pending will be only accessed in PublicLocalizer thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496663#comment-14496663 ] Jason Lowe commented on YARN-3491: -- Could you elaborate a bit on why the submit is time consuming? Unless I'm mistaken, the FSDownload constructor is very cheap and queueing should be simply tacking an entry on a queue. > Improve the public resource localization to do both FSDownload submission to > the thread pool and completed localization handling in one thread > (PublicLocalizer). > - > > Key: YARN-3491 > URL: https://issues.apache.org/jira/browse/YARN-3491 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > > Improve the public resource localization to do both FSDownload submission to > the thread pool and completed localization handling in one thread > (PublicLocalizer). > Currently FSDownload submission to the thread pool is done in > PublicLocalizer#addResource which is running in Dispatcher thread and > completed localization handling is done in PublicLocalizer#run which is > running in PublicLocalizer thread. > Because FSDownload submission to the thread pool at the following code is > time consuming, the thread pool can't be fully utilized. Instead of doing > public resource localization in parallel(multithreading), public resource > localization is serialized most of the time. > {code} > synchronized (pending) { > pending.put(queue.submit(new FSDownload(lfs, null, conf, > publicDirDestPath, resource, > request.getContext().getStatCache())), > request); > } > {code} > Also there are two more benefits with this change: > 1. The Dispatcher thread won't be blocked by above FSDownload submission. > Dispatcher thread handles most of time critical events at Node manager. > 2. don't need synchronization on HashMap (pending). > Because pending will be only accessed in PublicLocalizer thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496693#comment-14496693 ] Tsuyoshi Ozawa commented on YARN-3326: -- +1, committing this shortly. Hey [~Naganarasimha], could you open new JIRA to update documentation for this feature? > ReST support for getLabelsToNodes > -- > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496702#comment-14496702 ] zhihai xu commented on YARN-3491: - I saw the serialization for public resource localization in the following logs: The following log shows two private localization requests and many public localization requests from container_e30_1426628374875_110892_01_000475 {code} 2015-04-07 22:49:56,750 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_e30_1426628374875_110892_01_000475 transitioned from NEW to LOCALIZING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.xml transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp-1521315530/ace-geo.jar transitioned from INIT to DOWNLOADING 2015-04-07 22:49:56,751 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp1347512155/cortex-server.jar transitioned from INIT to DOWNLOADING {code} The following log shows how the public resource localizations are processed. {code} 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_e30_1426628374875_110892_01_000475 2015-04-07 22:49:56,758 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar, 1428446867531, FILE, null } 2015-04-07 22:49:56,882 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar, 1428446864128, FILE, null } 2015-04-07 22:49:56,902 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar(->/data2/yarn/nm/filecache/4877652/reflections.jar) transitioned from DOWNLOADING to LOCALIZED 2015-04-07 22:49:57,127 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar, 1428446858408, FILE, null } 2015-04-07 22:49:57,145 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar(->/data11/yarn/nm/filecache/4877653/service-media-sdk.jar) transitioned from DOWNLOADING to LOCALIZED 2015-04-07 22:49:57,251 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp182237/tmp-1521315530/ace-geo.jar, 1428446862857, FILE, null } 2015-04-07 22:49:57,270 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar(->/data1/yarn/nm/filecache/4877654/service-local-search-sdk.jar) transitioned from DOWNLOADING to LOCALIZED 2015-04-07 22:49:57,383 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://nameservice1/tmp/temp182237/tmp1347512155/cortex-server.jar, 1428446857069, FILE, null } {code} Based on the log, You can see the thread pools are not fully used, only one thread is used. The default thread
[jira] [Updated] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-3005: Assignee: Kengo Seki > [JDK7] Use switch statement for String instead of if-else statement in > RegistrySecurity.java > > > Key: YARN-3005 > URL: https://issues.apache.org/jira/browse/YARN-3005 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Akira AJISAKA >Assignee: Kengo Seki >Priority: Trivial > Labels: newbie > Fix For: 2.7.0 > > Attachments: YARN-3005.001.patch, YARN-3005.002.patch > > > Since we have moved to JDK7, we can refactor the below if-else statement for > String. > {code} > // TODO JDK7 SWITCH > if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { > access = AccessPolicy.sasl; > } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { > access = AccessPolicy.digest; > } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { > access = AccessPolicy.anon; > } else { > throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM > + "\"" + auth + "\""); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3326) Support RESTful API for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated YARN-3326: - Summary: Support RESTful API for getLabelsToNodes (was: ReST support for getLabelsToNodes ) > Support RESTful API for getLabelsToNodes > - > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496712#comment-14496712 ] Tsuyoshi Ozawa commented on YARN-3326: -- Committed this to trunk and branch-2. Thanks [~Naganarasimha] for your contribution and thanks [~vvasudev] for your review! > Support RESTful API for getLabelsToNodes > - > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java
[ https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496708#comment-14496708 ] Akira AJISAKA commented on YARN-3005: - Assigned [~sekikn]. Thanks. > [JDK7] Use switch statement for String instead of if-else statement in > RegistrySecurity.java > > > Key: YARN-3005 > URL: https://issues.apache.org/jira/browse/YARN-3005 > Project: Hadoop YARN > Issue Type: Improvement >Affects Versions: 2.7.0 >Reporter: Akira AJISAKA >Assignee: Kengo Seki >Priority: Trivial > Labels: newbie > Fix For: 2.7.0 > > Attachments: YARN-3005.001.patch, YARN-3005.002.patch > > > Since we have moved to JDK7, we can refactor the below if-else statement for > String. > {code} > // TODO JDK7 SWITCH > if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) { > access = AccessPolicy.sasl; > } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) { > access = AccessPolicy.digest; > } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) { > access = AccessPolicy.anon; > } else { > throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM > + "\"" + auth + "\""); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3394) WebApplication proxy documentation is incomplete
[ https://issues.apache.org/jira/browse/YARN-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496725#comment-14496725 ] Tsuyoshi Ozawa commented on YARN-3394: -- Thanks Naganarasimha for your contribution and thanks Jian for your commit! > WebApplication proxy documentation is incomplete > - > > Key: YARN-3394 > URL: https://issues.apache.org/jira/browse/YARN-3394 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Bibin A Chundatt >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: WebApplicationProxy.html, YARN-3394.20150324-1.patch > > > Webproxy documentation is incomplete > hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html > 1.Configuration of service start/stop as separate server > 2.Steps to start as daemon service > 3.Secure mode for Web proxy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496732#comment-14496732 ] Hadoop QA commented on YARN-2696: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725637/YARN-2696.2.patch against trunk revision 9e8309a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7348//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7348//console This message is automatically generated. > Queue sorting in CapacityScheduler should consider node label > - > > Key: YARN-2696 > URL: https://issues.apache.org/jira/browse/YARN-2696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2696.1.patch, YARN-2696.2.patch > > > In the past, when trying to allocate containers under a parent queue in > CapacityScheduler. The parent queue will choose child queues by the used > resource from smallest to largest. > Now we support node label in CapacityScheduler, we should also consider used > resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496731#comment-14496731 ] Hudson commented on YARN-3326: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7590 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7590/]) YARN-3326. Support RESTful API for getLabelsToNodes. Contributed by Naganarasimha G R. (ozawa: rev e48cedc663b8a26fd62140c8e2907f9b4edd9785) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LabelsToNodesInfo.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodeIDsInfo.java > Support RESTful API for getLabelsToNodes > - > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes
[ https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496733#comment-14496733 ] Naganarasimha G R commented on YARN-3326: - Thanks [~ozawa], thanks for the review, Will check the scope of yarn-2801 and if it doesnt cover this feature then will raise a new jira. > Support RESTful API for getLabelsToNodes > - > > Key: YARN-3326 > URL: https://issues.apache.org/jira/browse/YARN-3326 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Affects Versions: 2.6.0 >Reporter: Naganarasimha G R >Assignee: Naganarasimha G R >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, > YARN-3326.20150408-1.patch > > > REST to support to retrieve LabelsToNodes Mapping -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496735#comment-14496735 ] Thomas Graves commented on YARN-3434: - I am not saying child needs to know how parent calculate resource limit. I am saying user limit and whether it needs to unreserve to make another reservation has nothing to do with the parent queue (ie it doesn't apply to parent queue). Remember I'm not needing to store user limit, I'm needing to store the fact of whether it needs to unreserve and if it does how much does it need to unreserve. When a node heartbeats it goes through the regular assignments and updates the leafQueue clusterResources based on what the parent passes in. When a node is removed or added then it updates the resource limits (none of these apply to calculation of whether it needs to unreserve or not). Basically it comes down to is this information useful outside of the small window between when it calculates it and when its needed in assignContainer() and my thought is no. And you said it yourself in last bullet above. Although we have been referring to the userLImit and perhaps that is the problem. I don't need to store the userLimit, I need to store whether it needs to unreserve and if so how much. Therefore it fits better as a local transient variable rather then a globally stored one. If you store just the userLImit then you need to recalculate stuff which I'm trying to avoid. I understand why we are storing the current information in ResourceLimits because it has to do with headroom and parent limits and is recalculated at various points, but the current implementation in canAssignToUser doesn't use headroom at all and whether we need to unreserve or not on the last call to assignContainers doesn't affect the headroom calculation. Again basically all we would be doing is placing an extra global variable(s) in the ResourceLimits class just to pass it on down a couple of functions. That to me is a parameter. Now if we had multiple things needing this or updating it then to me fits better in the ResourceLimits. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496757#comment-14496757 ] Naganarasimha G R commented on YARN-3462: - Thanks for reviewing and Commiting , [~qwertymaniac] & [~sidharta-s] > Patches applied for YARN-2424 are inconsistent between trunk and branch-2 > - > > Key: YARN-3462 > URL: https://issues.apache.org/jira/browse/YARN-3462 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Sidharta Seethana >Assignee: Naganarasimha G R > Fix For: 2.7.1 > > Attachments: YARN-3462.20150508-1.patch > > > It looks like the changes for YARN-2424 are not the same for trunk (commit > 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit > 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning > and documentation is a bit different as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3492) AM fails to come up because RM and NM can't connect to each other
Karthik Kambatla created YARN-3492: -- Summary: AM fails to come up because RM and NM can't connect to each other Key: YARN-3492 URL: https://issues.apache.org/jira/browse/YARN-3492 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.7.0 Environment: pseudo-distributed cluster on a mac Reporter: Karthik Kambatla Priority: Blocker Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The container gets allocated, but doesn't get launched. The NM can't talk to the RM. Logs to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3492) AM fails to come up because RM and NM can't connect to each other
[ https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3492: --- Attachment: yarn-kasha-resourcemanager-kasha-mbp.local.log yarn-kasha-nodemanager-kasha-mbp.local.log > AM fails to come up because RM and NM can't connect to each other > - > > Key: YARN-3492 > URL: https://issues.apache.org/jira/browse/YARN-3492 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: pseudo-distributed cluster on a mac >Reporter: Karthik Kambatla >Priority: Blocker > Attachments: yarn-kasha-nodemanager-kasha-mbp.local.log, > yarn-kasha-resourcemanager-kasha-mbp.local.log > > > Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The > container gets allocated, but doesn't get launched. The NM can't talk to the > RM. Logs to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496881#comment-14496881 ] Jian He commented on YARN-2696: --- few minor comments - add a comment why no_label max resource is treated separately. {code} if (nodePartition == null || nodePartition.equals(RMNodeLabelsManager.NO_LABEL)) {code} - getChildrenAllocationIterator -> sortAndGetChildrenAllocationIterator > Queue sorting in CapacityScheduler should consider node label > - > > Key: YARN-2696 > URL: https://issues.apache.org/jira/browse/YARN-2696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2696.1.patch, YARN-2696.2.patch > > > In the past, when trying to allocate containers under a parent queue in > CapacityScheduler. The parent queue will choose child queues by the used > resource from smallest to largest. > Now we support node label in CapacityScheduler, we should also consider used > resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests
[ https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496888#comment-14496888 ] Jian He commented on YARN-3354: --- +1 > Container should contains node-labels asked by original ResourceRequests > > > Key: YARN-3354 > URL: https://issues.apache.org/jira/browse/YARN-3354 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, nodemanager, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3354.1.patch, YARN-3354.2.patch > > > We proposed non-exclusive node labels in YARN-3214, makes non-labeled > resource requests can be allocated on labeled nodes which has idle resources. > To make preemption work, we need know an allocated container's original node > label: when labeled resource requests comes back, we need kill non-labeled > containers running on labeled nodes. > This requires add node-labels in Container, and also, NM need store this > information and send back to RM when RM restart to recover original container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496892#comment-14496892 ] Jian He commented on YARN-2696: --- - Does this overlap with below {{Resources.equals(queueGuranteedResource, Resources.none()) ? 0}} check ? {code} // make queueGuranteed >= minimum_allocation to avoid divided by 0. queueGuranteedResource = Resources.max(rc, totalPartitionResource, queueGuranteedResource, minimumAllocation); {code} > Queue sorting in CapacityScheduler should consider node label > - > > Key: YARN-2696 > URL: https://issues.apache.org/jira/browse/YARN-2696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2696.1.patch, YARN-2696.2.patch > > > In the past, when trying to allocate containers under a parent queue in > CapacityScheduler. The parent queue will choose child queues by the used > resource from smallest to largest. > Now we support node label in CapacityScheduler, we should also consider used > resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496910#comment-14496910 ] Wangda Tan commented on YARN-3434: -- [~tgraves], Make sense to me, especially for the {{local transient variable rather then a globally stored one}}. So I think after the change, flows to use/update ResourceLimit will be: {code} In LeafQueue: Both: updateClusterResource | |--> resource-limit assignContainers | update&store (only for compute headroom) Only: assignContainers | V check queue limit | V check user limit | V set how-much-should-unreserve to ResourceLimits and pass down {code} Is that what you also think about? > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3492) AM fails to come up because RM and NM can't connect to each other
[ https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496918#comment-14496918 ] Tsuyoshi Ozawa commented on YARN-3492: -- [~kasha], could you attach yarn-site.xml and mapred-site.xml for investigation? > AM fails to come up because RM and NM can't connect to each other > - > > Key: YARN-3492 > URL: https://issues.apache.org/jira/browse/YARN-3492 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: pseudo-distributed cluster on a mac >Reporter: Karthik Kambatla >Priority: Blocker > Attachments: yarn-kasha-nodemanager-kasha-mbp.local.log, > yarn-kasha-resourcemanager-kasha-mbp.local.log > > > Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The > container gets allocated, but doesn't get launched. The NM can't talk to the > RM. Logs to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496920#comment-14496920 ] Jian He commented on YARN-3404: --- +1 > View the queue name to YARN Application page > > > Key: YARN-3404 > URL: https://issues.apache.org/jira/browse/YARN-3404 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, > YARN-3404.4.patch, screenshot.png > > > It want to display the name of the queue that is used to YARN Application > page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3492) AM fails to come up because RM and NM can't connect to each other
[ https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-3492: --- Attachment: yarn-site.xml mapred-site.xml > AM fails to come up because RM and NM can't connect to each other > - > > Key: YARN-3492 > URL: https://issues.apache.org/jira/browse/YARN-3492 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: pseudo-distributed cluster on a mac >Reporter: Karthik Kambatla >Priority: Blocker > Attachments: mapred-site.xml, > yarn-kasha-nodemanager-kasha-mbp.local.log, > yarn-kasha-resourcemanager-kasha-mbp.local.log, yarn-site.xml > > > Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The > container gets allocated, but doesn't get launched. The NM can't talk to the > RM. Logs to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2696: - Attachment: YARN-2696.3.patch Addressed all comments from [~jianhe] and fixed test failure in TestFifoScheduler, uploaded ver.3 patch. > Queue sorting in CapacityScheduler should consider node label > - > > Key: YARN-2696 > URL: https://issues.apache.org/jira/browse/YARN-2696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch > > > In the past, when trying to allocate containers under a parent queue in > CapacityScheduler. The parent queue will choose child queues by the used > resource from smallest to largest. > Now we support node label in CapacityScheduler, we should also consider used > resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests
[ https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496981#comment-14496981 ] Wangda Tan commented on YARN-3354: -- Test failure is not related to the patch. > Container should contains node-labels asked by original ResourceRequests > > > Key: YARN-3354 > URL: https://issues.apache.org/jira/browse/YARN-3354 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, nodemanager, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-3354.1.patch, YARN-3354.2.patch > > > We proposed non-exclusive node labels in YARN-3214, makes non-labeled > resource requests can be allocated on labeled nodes which has idle resources. > To make preemption work, we need know an allocated container's original node > label: when labeled resource requests comes back, we need kill non-labeled > containers running on labeled nodes. > This requires add node-labels in Container, and also, NM need store this > information and send back to RM when RM restart to recover original container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3404) View the queue name to YARN Application page
[ https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496990#comment-14496990 ] Hudson commented on YARN-3404: -- FAILURE: Integrated in Hadoop-trunk-Commit #7594 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7594/]) YARN-3404. Display queue name on application page. Contributed by Ryu Kobayashi (jianhe: rev b2e6cf607f1712d103520ca6b3ff21ecc07cd265) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java > View the queue name to YARN Application page > > > Key: YARN-3404 > URL: https://issues.apache.org/jira/browse/YARN-3404 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: Ryu Kobayashi >Assignee: Ryu Kobayashi >Priority: Minor > Fix For: 2.8.0 > > Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, > YARN-3404.4.patch, screenshot.png > > > It want to display the name of the queue that is used to YARN Application > page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests
[ https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497013#comment-14497013 ] Hudson commented on YARN-3354: -- FAILURE: Integrated in Hadoop-trunk-Commit #7595 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7595/]) YARN-3354. Add node label expression in ContainerTokenIdentifier to support RM recovery. Contributed by Wangda Tan (jianhe: rev 1b89a3e173f8e905074ed6714a7be5c003c0e2c4) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NMContainerStatus.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java > Container should contains node-labels asked by original ResourceRequests > > > Key: YARN-3354 > URL: https://issues.apache.org/jira/browse/YARN-3354 > Project: Hadoop YARN > Issue Type: Sub-task > Components: api, capacityscheduler, nodemanager, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Fix For: 2.8.0 > > Attachments: YARN-3354.1.patch, YARN-3354.2.patch > > > We proposed non-exclusive node labels in YARN-3214, makes non-labeled > resource requests can be allocated on labeled nodes which has idle resources. > To make preemption work, we need know an allocated container's original node > label: when labeled resource requests comes back, we need kill non-labeled > containers running on labeled nodes. > This requires add node-labels in Container, and also, NM need store this > information and send back to RM when RM restart to recover original container. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497055#comment-14497055 ] Thomas Graves commented on YARN-3434: - I agree with Both section. I'm not sure I completely follow the Only section. Are you suggesting we change the patch to modify ResourceLimits and pass down rather then using the LimitsInfo class? If so that won't work, at least not without adding the shouldContinue flag to it. Unless you mean keep LimitsInfo class for use locally in assignContainers and then pass ResourceLimits down to assignContainer with the value of amountNeededUnreserve as the limit. That wouldn't really change much exception the object we pass down through the functions. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497065#comment-14497065 ] Wangda Tan commented on YARN-3434: -- bq. Are you suggesting we change the patch to modify ResourceLimits and pass down rather then using the LimitsInfo class? Yes, that's my suggested. bq. at least not without adding the shouldContinue flag to it Kind of, what I'm thinking is we can add "amountNeededUnreserve" to ResourceLimits. canAssignToThisQueue/User will return boolean means shouldContinue, and set "amountNeededUnreserve" (instead of limit, we don't need to change limit). That very similar to your original logic and we don't need the extra LimitsInfo. After we get the updated the ResourceLimit and pass down, problem should be resolved. Did I miss anything? > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497076#comment-14497076 ] Thomas Graves commented on YARN-3434: - so you are saying add amountNeededUnreserve to ResourceLimits and then set the global currentResourceLimits.amountNeededUnreserve inside of canAssignToUser? This is what I was not in favor of above and there would be no need to pass it down as parameter. Or were you saying create a ResourceLimit and pass it as parameter to canAssignToUser and canAssignToThisQueue and modify that instance. That instance would then be passed down though to assignContainer()? I don't see how else you set the ResourceLimit. > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497085#comment-14497085 ] zhihai xu commented on YARN-3491: - Hi [~jlowe], thanks for the comment. Queueing is faster, but It take longer time to add FSDownload to the worker thread. If all threads in the thread pool are used, it will be very fast to add an entry to the queue LinkedBlockingQueue#offer. Based on the following code in ThreadPoolExecutor#execute, corePoolSize is thread pool size which is 4 in this case. workQueue.offer(command) is fast but addWorker is slow. It only queues the task when all threads in the thread pool are running. {code} public void execute(Runnable command) { if (command == null) throw new NullPointerException(); /* * Proceed in 3 steps: * * 1. If fewer than corePoolSize threads are running, try to * start a new thread with the given command as its first * task. The call to addWorker atomically checks runState and * workerCount, and so prevents false alarms that would add * threads when it shouldn't, by returning false. * * 2. If a task can be successfully queued, then we still need * to double-check whether we should have added a thread * (because existing ones died since last checking) or that * the pool shut down since entry into this method. So we * recheck state and if necessary roll back the enqueuing if * stopped, or start a new thread if there are none. * * 3. If we cannot queue task, then we try to add a new * thread. If it fails, we know we are shut down or saturated * and so reject the task. */ int c = ctl.get(); if (workerCountOf(c) < corePoolSize) { if (addWorker(command, true)) return; c = ctl.get(); } if (isRunning(c) && workQueue.offer(command)) { int recheck = ctl.get(); if (! isRunning(recheck) && remove(command)) reject(command); else if (workerCountOf(recheck) == 0) addWorker(null, false); } else if (!addWorker(command, false)) reject(command); } {code} The issue is: If the time to run one FSDownload(resource localization) is close to the time to run the submit(add FSDownload to the worker thread). The oscillation will happen and there will be only one worker thread running. Then Dispatcher thread will be blocked for longer time. The above logs can prove this situation. LocalizerRunner#addResource used by private localizer takes less than one millisecond to process one REQUEST_RESOURCE_LOCALIZATION event but PublicLocalizer#addResource used by public localizer takes 124 millisecond to process one REQUEST_RESOURCE_LOCALIZATION event. > Improve the public resource localization to do both FSDownload submission to > the thread pool and completed localization handling in one thread > (PublicLocalizer). > - > > Key: YARN-3491 > URL: https://issues.apache.org/jira/browse/YARN-3491 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > > Improve the public resource localization to do both FSDownload submission to > the thread pool and completed localization handling in one thread > (PublicLocalizer). > Currently FSDownload submission to the thread pool is done in > PublicLocalizer#addResource which is running in Dispatcher thread and > completed localization handling is done in PublicLocalizer#run which is > running in PublicLocalizer thread. > Because FSDownload submission to the thread pool at the following code is > time consuming, the thread pool can't be fully utilized. Instead of doing > public resource localization in parallel(multithreading), public resource > localization is serialized most of the time. > {code} > synchronized (pending) { > pending.put(queue.submit(new FSDownload(lfs, null, conf, > publicDirDestPath, resource, > request.getContext().getStatCache())), > request); > } > {code} > Also there are two more benefits with this change: > 1. The Dispatcher thread won't be blocked by above FSDownload submission. > Dispatcher thread handles most of time critical events at Node manager. > 2. don't need synchronization on HashMap (pending). > Because pending will be only accessed in PublicLocalizer thread. -- This message was sent by Atlassi
[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation
[ https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497087#comment-14497087 ] Wangda Tan commented on YARN-3434: -- bq. Or were you saying create a ResourceLimit and pass it as parameter to canAssignToUser and canAssignToThisQueue and modify that instance. That instance would then be passed down though to assignContainer()? I prefer the above one which is according to your previously comment "local transient variable rather than a globally stored one". Is this also what you preferred? > Interaction between reservations and userlimit can result in significant ULF > violation > -- > > Key: YARN-3434 > URL: https://issues.apache.org/jira/browse/YARN-3434 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 2.6.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: YARN-3434.patch > > > ULF was set to 1.0 > User was able to consume 1.4X queue capacity. > It looks like when this application launched, it reserved about 1000 > containers, each 8G each, within about 5 seconds. I think this allowed the > logic in assignToUser() to allow the userlimit to be surpassed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
Sumana Sathish created YARN-3493: Summary: RM fails to come up with error "Failed to load/recover state" when mem settings are changed Key: YARN-3493 URL: https://issues.apache.org/jira/browse/YARN-3493 Project: Hadoop YARN Issue Type: Bug Components: yarn Affects Versions: 2.7.0 Reporter: Sumana Sathish Priority: Critical Fix For: 2.7.0 RM fails to come up for the following case: 1. Change yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background and wait for the job to reach running state 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 before the above job completes 4. Restart RM 5. RM fails to come up with the below error {code:title= RM error for Mem settings changed} - RM app submission failed in validating AM resource request for application application_1429094976272_0008 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(579)) - Failed to load/recover state org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=3072, maxMemory=2048 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.64.patch rebased to current trunk > Integrate OrderingPolicy Framework with CapacityScheduler > - > > Key: YARN-3463 > URL: https://issues.apache.org/jira/browse/YARN-3463 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3463.50.patch, YARN-3463.61.patch, > YARN-3463.64.patch > > > Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumana Sathish updated YARN-3493: - Attachment: yarn-yarn-resourcemanager.log.zip > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Priority: Critical > Fix For: 2.7.0 > > Attachments: yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.s
[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497122#comment-14497122 ] Karthik Kambatla commented on YARN-3493: [~jianhe] - YARN-2010 should have fixed this right? > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Priority: Critical > Fix For: 2.7.0 > > Attachments: yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at
[jira] [Assigned] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He reassigned YARN-3493: - Assignee: Jian He > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.
[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3493: -- Fix Version/s: (was: 2.7.0) > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceM
[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497129#comment-14497129 ] Jian He commented on YARN-3493: --- [~kasha], I think this happened on a different code path. > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.
[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label
[ https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497139#comment-14497139 ] Hadoop QA commented on YARN-2696: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725687/YARN-2696.3.patch against trunk revision b2e6cf6. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7349//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7349//console This message is automatically generated. > Queue sorting in CapacityScheduler should consider node label > - > > Key: YARN-2696 > URL: https://issues.apache.org/jira/browse/YARN-2696 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler, resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch > > > In the past, when trying to allocate containers under a parent queue in > CapacityScheduler. The parent queue will choose child queues by the used > resource from smallest to largest. > Now we support node label in CapacityScheduler, we should also consider used > resource in child queues by node labels when allocating resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.65.patch Suppress orderingpolicy from appearing in web service responses, is still on the web ui > Integrate OrderingPolicy Framework with CapacityScheduler > - > > Key: YARN-3463 > URL: https://issues.apache.org/jira/browse/YARN-3463 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3463.50.patch, YARN-3463.61.patch, > YARN-3463.64.patch, YARN-3463.65.patch > > > Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2498) Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan reassigned YARN-2498: Assignee: Wangda Tan (was: Mayank Bansal) > Respect labels in preemption policy of capacity scheduler > - > > Key: YARN-2498 > URL: https://issues.apache.org/jira/browse/YARN-2498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, > yarn-2498-implementation-notes.pdf > > > There're 3 stages in ProportionalCapacityPreemptionPolicy, > # Recursively calculate {{ideal_assigned}} for queue. This is depends on > available resource, resource used/pending in each queue and guaranteed > capacity of each queue. > # Mark to-be preempted containers: For each over-satisfied queue, it will > mark some containers will be preempted. > # Notify scheduler about to-be preempted container. > We need respect labels in the cluster for both #1 and #2: > For #1, when there're some resource available in the cluster, we shouldn't > assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot > access such labels > For #2, when we make decision about whether we need preempt a container, we > need make sure, resource this container is *possibly* usable by a queue which > is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler
[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497261#comment-14497261 ] Wangda Tan commented on YARN-2498: -- Discussed with [~mayank_bansal], taking over and working on this, will post patch/implementation-notes soon. > Respect labels in preemption policy of capacity scheduler > - > > Key: YARN-2498 > URL: https://issues.apache.org/jira/browse/YARN-2498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, > yarn-2498-implementation-notes.pdf > > > There're 3 stages in ProportionalCapacityPreemptionPolicy, > # Recursively calculate {{ideal_assigned}} for queue. This is depends on > available resource, resource used/pending in each queue and guaranteed > capacity of each queue. > # Mark to-be preempted containers: For each over-satisfied queue, it will > mark some containers will be preempted. > # Notify scheduler about to-be preempted container. > We need respect labels in the cluster for both #1 and #2: > For #1, when there're some resource available in the cluster, we shouldn't > assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot > access such labels > For #2, when we make decision about whether we need preempt a container, we > need make sure, resource this container is *possibly* usable by a queue which > is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).
[ https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497304#comment-14497304 ] Sangjin Lee commented on YARN-3491: --- I have the same question as [~jlowe]. The actual call {code} synchronized (pending) { pending.put(queue.submit(new FSDownload(lfs, null, conf, publicDirDestPath, resource, request.getContext().getStatCache())), request); } {code} should be completely non-blocking and there is nothing that's expensive about it with the possible exception of the synchronization. Could you describe the root cause of the slowness you're seeing in some more detail? > Improve the public resource localization to do both FSDownload submission to > the thread pool and completed localization handling in one thread > (PublicLocalizer). > - > > Key: YARN-3491 > URL: https://issues.apache.org/jira/browse/YARN-3491 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.0 >Reporter: zhihai xu >Assignee: zhihai xu >Priority: Critical > > Improve the public resource localization to do both FSDownload submission to > the thread pool and completed localization handling in one thread > (PublicLocalizer). > Currently FSDownload submission to the thread pool is done in > PublicLocalizer#addResource which is running in Dispatcher thread and > completed localization handling is done in PublicLocalizer#run which is > running in PublicLocalizer thread. > Because FSDownload submission to the thread pool at the following code is > time consuming, the thread pool can't be fully utilized. Instead of doing > public resource localization in parallel(multithreading), public resource > localization is serialized most of the time. > {code} > synchronized (pending) { > pending.put(queue.submit(new FSDownload(lfs, null, conf, > publicDirDestPath, resource, > request.getContext().getStatCache())), > request); > } > {code} > Also there are two more benefits with this change: > 1. The Dispatcher thread won't be blocked by above FSDownload submission. > Dispatcher thread handles most of time critical events at Node manager. > 2. don't need synchronization on HashMap (pending). > Because pending will be only accessed in PublicLocalizer thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3493: -- Attachment: YARN-3493.1.patch Upload a patch to ignore this exception on recovery > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-3493.1.patch, yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yar
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497311#comment-14497311 ] Hadoop QA commented on YARN-3463: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725702/YARN-3463.64.patch against trunk revision 1b89a3e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1147 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7350//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7350//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7350//console This message is automatically generated. > Integrate OrderingPolicy Framework with CapacityScheduler > - > > Key: YARN-3463 > URL: https://issues.apache.org/jira/browse/YARN-3463 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3463.50.patch, YARN-3463.61.patch, > YARN-3463.64.patch, YARN-3463.65.patch > > > Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497315#comment-14497315 ] Vrushali C commented on YARN-3051: -- Hi [~varun_saxena] As per the discussion in the call today, here is the query document about flow (and user and queue) based queries that I had mentioned (put up on jira YARN-3050) https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx Also, some points that I think may be helpful: - the reader API is not going to be limited to one or two api calls - different queries will need different core read apis. For instance, all flow based queries may not need the application id or entity id info, but rather would need the flow id. for example, for a given user, return the flows that were run during this time frame. This query requires only cluster and cluster info, not entity nor application nor flowname is needed for the reader API to serve this query. This query cannot be boiled down to an entity level query. - So the reader API should allow for entity level, application level, flow level, user level, queue level and cluster level queries. > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497317#comment-14497317 ] Jian He commented on YARN-3493: --- cancel the patch, uploading a newer version. > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-3493.1.patch, yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.a
[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers
[ https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497320#comment-14497320 ] Sangjin Lee commented on YARN-3051: --- We chatted offline about the issue of what context is required for the reader API and the uniqueness requirement. I'm not sure if there is a complete agreement on this yet, but at least this is a proposal from us ([~vrushalic], [~jrottinghuis], and me). - for reader calls that ask for sub-application entities, the application id must be specified - uniqueness is similarly defined; (entity type, entity id) uniquely identifies an entity within the scope of a YARN application We feel that this is the most natural way of supporting writes/reads. One scenario to consider is reducing impact on current users of ATS, as v.2 would require app id which v.1 did not require. For that, we would need to update the user library to have a compatibility layer (e.g. tez, etc.). Thoughts? > [Storage abstraction] Create backing storage read interface for ATS readers > --- > > Key: YARN-3051 > URL: https://issues.apache.org/jira/browse/YARN-3051 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Sangjin Lee >Assignee: Varun Saxena > Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch > > > Per design in YARN-2928, create backing storage read interface that can be > implemented by multiple backing storage implementations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM
[ https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497341#comment-14497341 ] Sangjin Lee commented on YARN-3390: --- I took a pass at the patch, and it looks good for the most part. I would ask you to reconcile the TimelineCollectorManager changes with what I have over on YARN-3437. Again, I have a slight preference for the hook/template methods for the aforementioned reason, but it's not a strong preference one way or another. However, I'm not sure why there is a change for RMContainerAllocator.java. It doesn't look like an intended change? > Reuse TimelineCollectorManager for RM > - > > Key: YARN-3390 > URL: https://issues.apache.org/jira/browse/YARN-3390 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Attachments: YARN-3390.1.patch > > > RMTimelineCollector should have the context info of each app whose entity > has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497354#comment-14497354 ] Hadoop QA commented on YARN-3463: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12725714/YARN-3463.65.patch against trunk revision 1b89a3e. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1149 javac compiler warnings (more than the trunk's current 1147 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart Test results: https://builds.apache.org/job/PreCommit-YARN-Build/7351//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-YARN-Build/7351//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7351//console This message is automatically generated. > Integrate OrderingPolicy Framework with CapacityScheduler > - > > Key: YARN-3463 > URL: https://issues.apache.org/jira/browse/YARN-3463 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3463.50.patch, YARN-3463.61.patch, > YARN-3463.64.patch, YARN-3463.65.patch > > > Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed
[ https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-3493: -- Attachment: YARN-3493.2.patch uploaded a new patch > RM fails to come up with error "Failed to load/recover state" when mem > settings are changed > > > Key: YARN-3493 > URL: https://issues.apache.org/jira/browse/YARN-3493 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Affects Versions: 2.7.0 >Reporter: Sumana Sathish >Assignee: Jian He >Priority: Critical > Attachments: YARN-3493.1.patch, YARN-3493.2.patch, > yarn-yarn-resourcemanager.log.zip > > > RM fails to come up for the following case: > 1. Change yarn.nodemanager.resource.memory-mb and > yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml > 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in > background and wait for the job to reach running state > 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 > before the above job completes > 4. Restart RM > 5. RM fails to come up with the below error > {code:title= RM error for Mem settings changed} > - RM app submission failed in validating AM resource request for application > application_1429094976272_0008 > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208) > 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager > (ResourceManager.java:serviceStart(579)) - Failed to load/recover state > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested memory < 0, or requested memory > max configured, > requestedMemory=3072, maxMemory=2048 > at > org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317) > at > org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574) > at > org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) > at > org.apache.hadoop.yarn.server.
[jira] [Created] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics
Jian He created YARN-3494: - Summary: Expose AM resource limit and user limit in QueueMetrics Key: YARN-3494 URL: https://issues.apache.org/jira/browse/YARN-3494 Project: Hadoop YARN Issue Type: Bug Reporter: Jian He Now we have the AM resource limit and user limit shown on the web UI, it would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler
[ https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Craig Welch updated YARN-3463: -- Attachment: YARN-3463.66.patch Fix build warnings, the tests all pass on my box. > Integrate OrderingPolicy Framework with CapacityScheduler > - > > Key: YARN-3463 > URL: https://issues.apache.org/jira/browse/YARN-3463 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Craig Welch >Assignee: Craig Welch > Attachments: YARN-3463.50.patch, YARN-3463.61.patch, > YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch > > > Integrate the OrderingPolicy Framework with the CapacityScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics
[ https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-3494: Assignee: Rohith > Expose AM resource limit and user limit in QueueMetrics > > > Key: YARN-3494 > URL: https://issues.apache.org/jira/browse/YARN-3494 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Jian He >Assignee: Rohith > > Now we have the AM resource limit and user limit shown on the web UI, it > would be useful to expose them in the QueueMetrics as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)