[jira] [Created] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode
Varun Vasudev created YARN-2767: --- Summary: RM web services - add test case to ensure the http static user can kill or submit apps in secure mode Key: YARN-2767 URL: https://issues.apache.org/jira/browse/YARN-2767 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev We should add a test to ensure that the http static user used to access the RM web interface can't submit or kill apps if the cluster is running in secure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode
[ https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2767: Attachment: apache-yarn-2767.0.patch Uploaded patch with new test case. RM web services - add test case to ensure the http static user can kill or submit apps in secure mode - Key: YARN-2767 URL: https://issues.apache.org/jira/browse/YARN-2767 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2767.0.patch We should add a test to ensure that the http static user used to access the RM web interface can't submit or kill apps if the cluster is running in secure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2761) potential race condition in SchedulingPolicy
[ https://issues.apache.org/jira/browse/YARN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188126#comment-14188126 ] Hadoop QA commented on YARN-2761: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677844/YARN-2761.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5622//console This message is automatically generated. potential race condition in SchedulingPolicy Key: YARN-2761 URL: https://issues.apache.org/jira/browse/YARN-2761 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2761.patch reported by findbug. In SchedulingPolicy.getInstance, ConcurrentHashMap.get and ConcurrentHashMap.put is called. These two operations together should be atomic, but using ConcurrentHashMap doesn't guarantee this. {code} public static SchedulingPolicy getInstance(Class? extends SchedulingPolicy clazz) { SchedulingPolicy policy = instances.get(clazz); if (policy == null) { policy = ReflectionUtils.newInstance(clazz, null); instances.put(clazz, policy); } return policy; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188125#comment-14188125 ] Hadoop QA commented on YARN-2698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677821/YARN-2698-20141028-3.patch against trunk revision 3c5f5af. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-YARN-Build/5619//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestMRTimelineEventHandling org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5619//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5619//console This message is automatically generated. Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode
[ https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188127#comment-14188127 ] Hadoop QA commented on YARN-2767: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677839/apache-yarn-2767.0.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5621//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5621//console This message is automatically generated. RM web services - add test case to ensure the http static user can kill or submit apps in secure mode - Key: YARN-2767 URL: https://issues.apache.org/jira/browse/YARN-2767 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2767.0.patch We should add a test to ensure that the http static user used to access the RM web interface can't submit or kill apps if the cluster is running in secure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode
[ https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2767: Attachment: apache-yarn-2767.1.patch Uploaded a new patch with some variable names fixed. RM web services - add test case to ensure the http static user can kill or submit apps in secure mode - Key: YARN-2767 URL: https://issues.apache.org/jira/browse/YARN-2767 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch We should add a test to ensure that the http static user used to access the RM web interface can't submit or kill apps if the cluster is running in secure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
Hong Zhiguo created YARN-2768: - Summary: optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.**multiply**(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(**clone**(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: profiling_FairScheduler_update.png optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Description: See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. was: See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.**multiply**(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(**clone**(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188146#comment-14188146 ] Hadoop QA commented on YARN-2768: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677853/profiling_FairScheduler_update.png against trunk revision ec63a3f. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5624//console This message is automatically generated. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2768: -- Attachment: YARN-2768.patch Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo. After this optimization, the average time costed by FairScheduler.update (a TestCase with 10k apps) is reduced 40%. I'm not sure whether it's better to have such test cases also submitted. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2768.patch, profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode
[ https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188182#comment-14188182 ] Hadoop QA commented on YARN-2767: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677848/apache-yarn-2767.1.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5623//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5623//console This message is automatically generated. RM web services - add test case to ensure the http static user can kill or submit apps in secure mode - Key: YARN-2767 URL: https://issues.apache.org/jira/browse/YARN-2767 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch We should add a test to ensure that the http static user used to access the RM web interface can't submit or kill apps if the cluster is running in secure mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188188#comment-14188188 ] Yogesh Sobale commented on YARN-1902: - Can some please update ? Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188190#comment-14188190 ] Yogesh Sobale commented on YARN-1902: - Can someone please update ? Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread
[ https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188222#comment-14188222 ] Hadoop QA commented on YARN-2768: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677855/YARN-2768.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5625//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5625//console This message is automatically generated. optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread Key: YARN-2768 URL: https://issues.apache.org/jira/browse/YARN-2768 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2768.patch, profiling_FairScheduler_update.png See the attached picture of profiling result. The clone of Resource object within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the function FairScheduler.update(). The code of FSAppAttempt.updateDemand: {code} public void updateDemand() { demand = Resources.createResource(0); // Demand is current consumption plus outstanding requests Resources.addTo(demand, app.getCurrentConsumption()); // Add up outstanding resource requests synchronized (app) { for (Priority p : app.getPriorities()) { for (ResourceRequest r : app.getResourceRequests(p).values()) { Resource total = Resources.multiply(r.getCapability(), r.getNumContainers()); Resources.addTo(demand, total); } } } } {code} The code of Resources.multiply: {code} public static Resource multiply(Resource lhs, double by) { return multiplyTo(clone(lhs), by); } {code} The clone could be skipped by directly update the value of this.demand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188266#comment-14188266 ] Hudson commented on YARN-2758: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188270#comment-14188270 ] Hudson commented on YARN-2747: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is enabled. Contributed by Xuan Gong. (zjshen: rev ec63a3ffbd9413e7434594682fdbbd36eef7413c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt TestAggregatedLogFormat fails in trunk -- Key: YARN-2747 URL: https://issues.apache.org/jira/browse/YARN-2747 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2747.1.patch Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat) Time elapsed: 0.047 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188275#comment-14188275 ] Hudson commented on YARN-2741: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.6.0 Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188277#comment-14188277 ] Hudson commented on YARN-2760: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188265#comment-14188265 ] Hudson commented on YARN-2503: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/727/]) YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java Missing CHANGES.txt for YARN-2503. (jianhe: rev 0782f602881272392381486bcc749850f96acd22) * hadoop-yarn-project/CHANGES.txt Changes in RM Web UI to better show labels to end users --- Key: YARN-2503 URL: https://issues.apache.org/jira/browse/YARN-2503 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, YARN-2503.patch Include but not limited to: - Show labels of nodes in RM/nodes page - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188336#comment-14188336 ] Hudson commented on YARN-2747: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is enabled. Contributed by Xuan Gong. (zjshen: rev ec63a3ffbd9413e7434594682fdbbd36eef7413c) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java TestAggregatedLogFormat fails in trunk -- Key: YARN-2747 URL: https://issues.apache.org/jira/browse/YARN-2747 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2747.1.patch Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat) Time elapsed: 0.047 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188331#comment-14188331 ] Hudson commented on YARN-2503: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java Missing CHANGES.txt for YARN-2503. (jianhe: rev 0782f602881272392381486bcc749850f96acd22) * hadoop-yarn-project/CHANGES.txt Changes in RM Web UI to better show labels to end users --- Key: YARN-2503 URL: https://issues.apache.org/jira/browse/YARN-2503 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, YARN-2503.patch Include but not limited to: - Show labels of nodes in RM/nodes page - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188332#comment-14188332 ] Hudson commented on YARN-2758: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java * hadoop-yarn-project/CHANGES.txt Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188341#comment-14188341 ] Hudson commented on YARN-2741: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/CHANGES.txt Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.6.0 Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188343#comment-14188343 ] Hudson commented on YARN-2760: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users
[ https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188374#comment-14188374 ] Hudson commented on YARN-2503: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java Missing CHANGES.txt for YARN-2503. (jianhe: rev 0782f602881272392381486bcc749850f96acd22) * hadoop-yarn-project/CHANGES.txt Changes in RM Web UI to better show labels to end users --- Key: YARN-2503 URL: https://issues.apache.org/jira/browse/YARN-2503 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.6.0 Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, YARN-2503.patch Include but not limited to: - Show labels of nodes in RM/nodes page - Show labels of queue in RM/scheduler page -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs
[ https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188386#comment-14188386 ] Hudson commented on YARN-2760: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) (kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm * hadoop-yarn-project/CHANGES.txt Completely remove word 'experimental' from FairScheduler docs - Key: YARN-2760 URL: https://issues.apache.org/jira/browse/YARN-2760 Project: Hadoop YARN Issue Type: Bug Components: documentation Affects Versions: 2.1.0-beta Reporter: Harsh J Assignee: Harsh J Priority: Trivial Fix For: 2.6.0 Attachments: YARN-2760.patch, YARN-2760.patch After YARN-1034, FairScheduler has not been 'experimental' in any aspect of use, but the doc change done in that did not entirely cover removal of that word, leaving a remnant in the preemption sub-point. This needs to be removed as well, as the feature has been good to use for a long time now, and is not experimental. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag
[ https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188384#comment-14188384 ] Hudson commented on YARN-2741: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. Contributed by Craig Welch. (zjshen: rev 8984e9b1774033e379b57da1bd30a5c81888c7a3) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanager is running on) -- Key: YARN-2741 URL: https://issues.apache.org/jira/browse/YARN-2741 Project: Hadoop YARN Issue Type: Bug Components: nodemanager Affects Versions: 2.6.0 Environment: Windows Reporter: Craig Welch Assignee: Craig Welch Fix For: 2.6.0 Attachments: YARN-2741.1.patch, YARN-2741.6.patch PROBLEM: User is getting No Logs available for Container Container_number when setting the yarn.nodemanager.log-dirs to any drive letter other than C: STEPS TO REPRODUCE: On Windows 1) Run NodeManager on C: 2) Create two local drive partitions D: and E: 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs 4) Run a MR job that will last at least 5 minutes 5) While the job is in flight, log into the Yarn web ui , resource_manager_server:8088/cluster 6) Click on the application_idnumber 7) Click on the logs link, you will get No Logs available for Container Container_number ACTUAL BEHAVIOR: Getting an error message when viewing the container logs EXPECTED BEHAVIOR: Able to use different drive letters in yarn.nodemanager.log-dirs and not get error NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able to see the container logs while the MR job is in flight. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk
[ https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188379#comment-14188379 ] Hudson commented on YARN-2747: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is enabled. Contributed by Xuan Gong. (zjshen: rev ec63a3ffbd9413e7434594682fdbbd36eef7413c) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java * hadoop-yarn-project/CHANGES.txt TestAggregatedLogFormat fails in trunk -- Key: YARN-2747 URL: https://issues.apache.org/jira/browse/YARN-2747 Project: Hadoop YARN Issue Type: Test Reporter: Xuan Gong Assignee: Xuan Gong Fix For: 2.6.0 Attachments: YARN-2747.1.patch Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat) Time elapsed: 0.047 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store
[ https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188375#comment-14188375 ] Hudson commented on YARN-2758: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/]) YARN-2758. Update TestApplicationHistoryClientService to use the new generic history store. Contributed by Zhijie Shen (xgong: rev 69f79bee8b3da07bf42e22e35e58c7719782e31f) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java Update TestApplicationHistoryClientService to use the new generic history store --- Key: YARN-2758 URL: https://issues.apache.org/jira/browse/YARN-2758 Project: Hadoop YARN Issue Type: Test Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Fix For: 2.6.0 Attachments: YARN-2758.1.patch TestApplicationHistoryClientService is still testing against the mock data in the old MemoryApplicationHistoryStore. hence it needs to be updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
Varun Vasudev created YARN-2769: --- Summary: TestDistributedShell#testDSShell fails on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Description: Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188465#comment-14188465 ] Varun Vasudev commented on YARN-2769: - Attached fix. TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Attachment: apache-yarn-2769.0.patch TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188464#comment-14188464 ] Varun Vasudev commented on YARN-2769: - Since we use shell_command in the test, {noformat} if (envs.containsKey(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION)) { {noformat} is false on Windows(but true on Linux). Just moving the domain id setting out of this if-condition fixes the bug. TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Description: {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} was: Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2711) TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188472#comment-14188472 ] Junping Du commented on YARN-2711: -- Thanks [~vvasudev] for the patch and [~cwelch] for review! Patch looks good to me. Will commit it shortly. TestDefaultContainerExecutor#testContainerLaunchError fails on Windows -- Key: YARN-2711 URL: https://issues.apache.org/jira/browse/YARN-2711 Project: Hadoop YARN Issue Type: Bug Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2711.0.patch The testContainerLaunchError test fails on Windows with the following error - {noformat} java.io.FileNotFoundException: File file:/bin/echo does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111) at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117) at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145) at org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-2765: - Attachment: YARN-2765v2.patch Thanks for the review, Tsuyoshi! bq. How about adding helper methods like getKeyPrefix/getNodePath for getting key prefix and node path? Sure, added some helper methods to compute leveldb keys for various things. bq. I found that the patch includes lots hard-coded /. I think it's better to have private field SEPARATOR = /. IMHO this makes the code less readable, similar to a code style like {{final int ONE = 1}}. But I don't care too strongly about it and changed all occurrences to SEPARATOR. For Zhijie's comments: bq. One drawback I can think of is that while LeveldbRMStateStore is lightweight for single RM restarting, multiple RMs of HA are not able to share this single-host database. This should work if the leveldb database is on a network store like a filer. Leveldb uses locks to prevent multiple processes from trying to access the database simultaneously, so there's a little bit of help for the fencing scenarios. However the fencing script actions would have to do some extra work to force a poorly-behaving resourcemanager to let go of the locks so a standby RM can open the store and become active. bq. Did you have a chance to think of an enhanced k/v db: rocksdb? I briefly considered using rocksdb for this but decided against it for a couple of reasons: * leveldb is already used by the timeline server and nodemanager, and I would rather avoid adding yet another new dependency for this * leveldb supports win32/win64, but it doesn't appear that the standard rocksdbjni distribution has support for Windows. Add leveldb-based implementation for RMStateStore - Key: YARN-2765 URL: https://issues.apache.org/jira/browse/YARN-2765 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2765.patch, YARN-2765v2.patch It would be nice to have a leveldb option to the resourcemanager recovery store. Leveldb would provide some benefits over the existing filesystem store such as better support for atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability
[ https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188516#comment-14188516 ] Bikas Saha commented on YARN-1902: -- bq. Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Firstly, I am not sure if the same ContainerRequest object can be passed multiple times in addContainerRequest. It should be different objects each time (even if they point to the same resource). This might have something to do with the internal book-keeping done for matching requests. Secondly, after z requests are made and 1 allocation is received then z-1 requests remain. If you are using AMRMClientImpl then its your (users) responsibility to call removeContainerRequest() for the request that was matched to this container. The AMRMClient does not know which of your z requests could be assigned to this container. So in the general case, it cannot automatically remove a request from the internal table because it does not know which request to remove. If the javadocs dont clarify these semantics then we can improve the javadocs. Thirdly, the protocol between the AMRMClient and the RM has an inherent race. So if the client had earlier asked for z containers and in the next heartbeat reduces that to z-1, the RM may actually return z containers to it because it had already allocated them to this client before the client updated the RM with the new value. Allocation of too many containers when a second request is done with the same resource capability - Key: YARN-1902 URL: https://issues.apache.org/jira/browse/YARN-1902 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 2.2.0, 2.3.0, 2.4.0 Reporter: Sietse T. Au Labels: client Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch Regarding AMRMClientImpl Scenario 1: Given a ContainerRequest x with Resource y, when addContainerRequest is called z times with x, allocate is called and at least one of the z allocated containers is started, then if another addContainerRequest call is done and subsequently an allocate call to the RM, (z+1) containers will be allocated, where 1 container is expected. Scenario 2: No containers are started between the allocate calls. Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) are requested in both scenarios, but that only in the second scenario, the correct behavior is observed. Looking at the implementation I have found that this (z+1) request is caused by the structure of the remoteRequestsTable. The consequence of MapResource, ResourceRequestInfo is that ResourceRequestInfo does not hold any information about whether a request has been sent to the RM yet or not. There are workarounds for this, such as releasing the excess containers received. The solution implemented is to initialize a new ResourceRequest in ResourceRequestInfo when a request has been successfully sent to the RM. The patch includes a test in which scenario one is tested. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188526#comment-14188526 ] Hadoop QA commented on YARN-2769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677908/apache-yarn-2769.0.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5627//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5627//console This message is automatically generated. TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Test Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188530#comment-14188530 ] Varun Vasudev commented on YARN-2769: - I haven't included any test since this is a fix for a test failing on Windows. TestDistributedShell#testDSShell fails on Windows - Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Test Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2698: - Attachment: YARN-2698-20141029-1.patch Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Summary: Timeline server domain not set correctly when using shell_command on Windows (was: TestDistributedShell#testDSShell fails on Windows) Timeline server domain not set correctly when using shell_command on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Test Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Issue Type: Bug (was: Test) Timeline server domain not set correctly when using shell_command on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188589#comment-14188589 ] Hadoop QA commented on YARN-2765: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677911/YARN-2765v2.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5626//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5626//console This message is automatically generated. Add leveldb-based implementation for RMStateStore - Key: YARN-2765 URL: https://issues.apache.org/jira/browse/YARN-2765 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2765.patch, YARN-2765v2.patch It would be nice to have a leveldb option to the resourcemanager recovery store. Leveldb would provide some benefits over the existing filesystem store such as better support for atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-2769: Description: The bug is caught by one of the unit tests which fails. {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} was: {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} Timeline server domain not set correctly when using shell_command on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch The bug is caught by one of the unit tests which fails. {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188595#comment-14188595 ] Zhijie Shen commented on YARN-2765: --- bq. This should work if the leveldb database is on a network store like a filer. Thanks for sharing. This is an interesting use case that I'm not aware of before. bq. I briefly considered using rocksdb for this but decided against it for a couple of reasons: It's not particularly related to this Jira, but I just want to think it out loudly. It seems that rocksdb claims to have better performance in terms of I/O than leveldb, while their APIs are very similar to each other. After we have the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the timeline server may want a stronger one. Rocksdb may be a compromise before migrating to fully distributed storage solution based on HBase. And one other merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm wrong, but it seems that rocksdb can also help to scale out the storage problem as well as support RM HA deployment in a shared nothing environment (e.g. without a network storage). I'm not saying we should go with rocksdb now instead of leveldb, as we know it has been used for other components already. I'm trying to propose if we can think of rocksdb, which looks stronger but still reasonably simple alternate. Add leveldb-based implementation for RMStateStore - Key: YARN-2765 URL: https://issues.apache.org/jira/browse/YARN-2765 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2765.patch, YARN-2765v2.patch It would be nice to have a leveldb option to the resourcemanager recovery store. Leveldb would provide some benefits over the existing filesystem store such as better support for atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM
Zhijie Shen created YARN-2770: - Summary: Timeline delegation tokens need to be automatically renewed by the RM Key: YARN-2770 URL: https://issues.apache.org/jira/browse/YARN-2770 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.5.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical YarnClient will automatically grab a timeline DT for the application and pass it to the app AM. Now the timeline DT renew is still dummy. If an app is running for more than 24h (default DT expiry time), the app AM is no longer able to use the expired DT to communicate with the timeline server. Since RM will cache the credentials of each app, and renew the DTs for the running app. We should provider renew hooks similar to what HDFS DT has for RM, and set RM user as the renewer when grabbing the timeline DT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188595#comment-14188595 ] Zhijie Shen edited comment on YARN-2765 at 10/29/14 5:22 PM: - bq. This should work if the leveldb database is on a network store like a filer. Thanks for sharing. This is an interesting use case that I'm not aware of before. bq. I briefly considered using rocksdb for this but decided against it for a couple of reasons: It's not particularly related to this Jira, but I just want to think it out loudly. It seems that rocksdb claims to have better performance in terms of I/O than leveldb, while their APIs are very similar to each other. After we have the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the timeline server may want a stronger one. Rocksdb may be a compromise before migrating to fully distributed storage solution based on HBase. And one other merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm wrong, but it seems that rocksdb can also help to scale out the storage problem as well as support RM HA deployment in a shared nothing environment (e.g. without a network storage). I'm not saying we should go with rocksdb now instead of leveldb, as we know it has been used for other components already. I'm trying to propose if we can think of rocksdb, which looks stronger but still reasonably simple alternate. There's a rocksdb jni which seems to have windows support: https://github.com/fusesource/rocksdbjni It should be the same org whose leveldbjni is currently used by us. was (Author: zjshen): bq. This should work if the leveldb database is on a network store like a filer. Thanks for sharing. This is an interesting use case that I'm not aware of before. bq. I briefly considered using rocksdb for this but decided against it for a couple of reasons: It's not particularly related to this Jira, but I just want to think it out loudly. It seems that rocksdb claims to have better performance in terms of I/O than leveldb, while their APIs are very similar to each other. After we have the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the timeline server may want a stronger one. Rocksdb may be a compromise before migrating to fully distributed storage solution based on HBase. And one other merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm wrong, but it seems that rocksdb can also help to scale out the storage problem as well as support RM HA deployment in a shared nothing environment (e.g. without a network storage). I'm not saying we should go with rocksdb now instead of leveldb, as we know it has been used for other components already. I'm trying to propose if we can think of rocksdb, which looks stronger but still reasonably simple alternate. Add leveldb-based implementation for RMStateStore - Key: YARN-2765 URL: https://issues.apache.org/jira/browse/YARN-2765 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2765.patch, YARN-2765v2.patch It would be nice to have a leveldb option to the resourcemanager recovery store. Leveldb would provide some benefits over the existing filesystem store such as better support for atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188614#comment-14188614 ] Karthik Kambatla commented on YARN-2742: +1 FairSchedulerConfiguration fails to parse if there is extra space between value and unit Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2742: --- Summary: FairSchedulerConfiguration should allow extra spaces between value and unit (was: FairSchedulerConfiguration fails to parse if there is extra space between value and unit) FairSchedulerConfiguration should allow extra spaces between value and unit --- Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit
[ https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188639#comment-14188639 ] Hudson commented on YARN-2742: -- FAILURE: Integrated in Hadoop-trunk-Commit #6382 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6382/]) YARN-2742. FairSchedulerConfiguration should allow extra spaces between value and unit. (Wei Yan via kasha) (kasha: rev 782971ae7a0247bcf5920e10b434b7e0954dd868) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java FairSchedulerConfiguration should allow extra spaces between value and unit --- Key: YARN-2742 URL: https://issues.apache.org/jira/browse/YARN-2742 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.0 Reporter: Sangjin Lee Assignee: Wei Yan Priority: Minor Fix For: 2.7.0 Attachments: YARN-2742-1.patch, YARN-2742-2.patch FairSchedulerConfiguration is very strict about the number of space characters between the value and the unit: 0 or 1 space. For example, for values like the following: {noformat} maxResources4096 mb, 2 vcoresmaxResources {noformat} (note 2 spaces) This above line fails to parse: {noformat} 2014-10-24 22:56:40,802 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Failed to reload fair scheduler config file - will use existing allocations. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException: Missing resource: mb at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188728#comment-14188728 ] Siqi Li commented on YARN-2755: --- Hi [~jlowe] can you take a look at this? NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type
[ https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188739#comment-14188739 ] Karthik Kambatla commented on YARN-2690: Looks mostly good. Can we look into the javadoc warnings? Few minor comments: # Rename ReservationSchedulerConfiguration to ReservationConfiguration? Not sure the Scheduler in there is adding much information. # Make ReservationConfiguration an abstract class that extends Configuration instead of an interface, so it can implement some of the getters at least those for which it carries defaults. # Nit: The time defaults should be product of numbers instead of the result. e.g. {{24 * 60 * 60 * 1000}} instead of 8640L. Make ReservationSystem and its dependent classes independent of Scheduler type Key: YARN-2690 URL: https://issues.apache.org/jira/browse/YARN-2690 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2690.001.patch, YARN-2690.002.patch, YARN-2690.002.patch, YARN-2690.003.patch A lot of common reservation classes depend on CapacityScheduler and specifically its configuration. This jira is to make them ready for other Schedulers by abstracting out the configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore
[ https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188755#comment-14188755 ] Jason Lowe commented on YARN-2765: -- I agree that the timeline server seems like a worthy candidate for rocksdb. IIUC rocksdb's main use-case over leveldb is better performance when the database is larger than the node's RAM, which is likely in the case of the timeline server. bq. And one other merit I've heard about rocksdb is that it can ride on HDFS. This is news to me. I knew rocksdb could be used as a cache of data that came from HDFS or could be backed-up to HDFS, but I didn't think it could read/write directly to it as part of normal operations. bq. There's a rocksdb jni which seems to have windows support: https://github.com/fusesource/rocksdbjni Awesome, thanks for finding that. I was looking at the standard org.rocksdb package. Only concern with the fusesource option would be if it starts to diverge significantly from the standard one. The API is already slightly different between the two, and the fusesource one hasn't been touched in a year while the org.rocksdb package was updated just last week. Probably best to continue this conversation in a separate JIRA proposing we consider rocksdb for the timeline server. If it works well there it should be very straightforward to provide store backends for the RM, NM, and JHS if it makes sense for them as well. Add leveldb-based implementation for RMStateStore - Key: YARN-2765 URL: https://issues.apache.org/jira/browse/YARN-2765 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jason Lowe Assignee: Jason Lowe Attachments: YARN-2765.patch, YARN-2765v2.patch It would be nice to have a leveldb option to the resourcemanager recovery store. Leveldb would provide some benefits over the existing filesystem store such as better support for atomic operations, fewer I/O ops per state update, and far fewer total files on the filesystem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2771) DistributedShell's DSConstants are badly named
Vinod Kumar Vavilapalli created YARN-2771: - Summary: DistributedShell's DSConstants are badly named Key: YARN-2771 URL: https://issues.apache.org/jira/browse/YARN-2771 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of DISTRIBUTEDSHELLTIMELINEDOMAIN). DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? For the old envs, we can just add new envs that point to the old-one and deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188787#comment-14188787 ] Karthik Kambatla commented on YARN-2738: Do we want to make it configurable per-queue from the beginning? How about just starting with global settings for all queues, and adding per-queue configs depending on usecases and user feedback? Comments on the patch itself: # FairReservationSystem: The TODO is not clear to me. IAC, we should avoid orphan TODOs - can we file a follow-up JIRA and add a reference at the TODO. # Spurious import changes in a couple of files. Add FairReservationSystem for FairScheduler --- Key: YARN-2738 URL: https://issues.apache.org/jira/browse/YARN-2738 Project: Hadoop YARN Issue Type: Sub-task Components: fairscheduler Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2738.001.patch Need to create a FairReservationSystem that will implement ReservationSystem for FairScheduler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188796#comment-14188796 ] Zhijie Shen commented on YARN-2769: --- +1. The fix makes sense, and we have the test to cover the code path on windows. Will commit the patch. Timeline server domain not set correctly when using shell_command on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-2769.0.patch The bug is caught by one of the unit tests which fails. {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows
[ https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188830#comment-14188830 ] Hudson commented on YARN-2769: -- FAILURE: Integrated in Hadoop-trunk-Commit #6385 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6385/]) YARN-2769. Fixed the problem that timeline domain is not set in distributed shell AM when using shell_command on Windows. Contributed by Varun Vasudev. (zjshen: rev a8c120222047280234c3411ce1c1c9b17f08c851) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java * hadoop-yarn-project/CHANGES.txt Timeline server domain not set correctly when using shell_command on Windows Key: YARN-2769 URL: https://issues.apache.org/jira/browse/YARN-2769 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.6.0 Attachments: apache-yarn-2769.0.patch The bug is caught by one of the unit tests which fails. {noformat} Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec FAILURE! - in org.apache.hadoop.yarn.applications.distribut testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 37.366 sec FAILURE! org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT] at org.junit.Assert.assertEquals(Assert.java:115) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188891#comment-14188891 ] Hadoop QA commented on YARN-2698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677927/YARN-2698-20141029-1.patch against trunk revision ec63a3f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapred.TestMRTimelineEventHandling {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5628//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5628//console This message is automatically generated. Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2698: - Attachment: YARN-2698-20141029-2.patch Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, YARN-2698-20141029-2.patch YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)
[ https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naganarasimha G R updated YARN-2495: Attachment: YARN-2495.20141030-1.patch Hi [~wangda], I am uploading a patch with all the review comments fixed and with test cases, but i need to rebase it based on the latest code in trunk which i will do it tomorrow morning . With this patch you can review and if fine will submit the patch after re basing tomorrow Allow admin specify labels from each NM (Distributed configuration) --- Key: YARN-2495 URL: https://issues.apache.org/jira/browse/YARN-2495 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Wangda Tan Assignee: Naganarasimha G R Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, YARN-2495.20141030-1.patch, YARN-2495_20141022.1.patch Target of this JIRA is to allow admin specify labels in each NM, this covers - User can set labels in each NM (by setting yarn-site.xml or using script suggested by [~aw]) - NM will send labels to RM via ResourceTracker API - RM will set labels in NodeLabelManager when NM register/update labels -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188925#comment-14188925 ] Wangda Tan commented on YARN-2698: -- Hi [~vinodkv], bq. YarnClient usually has simpler APIs (like returning a map) instead of directly exposing the response objects, let’s do that. Addressed bq. bin/yarn needs to be updated to use the new CLI Addressed bq. Overall, I didn’t realize we already have a node CLI already: Let’s just move the node to labels mappings to that CLI. We could keep the all-nodes mapping though. The node CLI is major get labels from NodeReport, they're all running NMs, I suggest to keep node to labels mapping in node-labels CLI (as its name), and in the future we can add a labels field in NodeReport and nodeCLI bq. “will return all labels in the cluster” - “will return all accessible labels in the cluster” I changed it to be .. return all node labels to make it consistent with java API names, please let me know if you disagree bq. CLI for node-labels -list” should drop the prefix “Node-labels=“ Addressed bq. CLI for “node-labels -list -nodeId all”: Say Node instead of Host? And then simply make it “Node:nm:5432 - label1, label2” Addressed bq. Move the node-cli tests into their own TestNodeLabelsCLI Addressed bq. Validate the help message for the new CLI. Addressed Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, YARN-2698-20141029-2.patch YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named
[ https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2771: -- Component/s: applications/distributed-shell DistributedShell's DSConstants are badly named -- Key: YARN-2771 URL: https://issues.apache.org/jira/browse/YARN-2771 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of DISTRIBUTEDSHELLTIMELINEDOMAIN). DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? For the old envs, we can just add new envs that point to the old-one and deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2772) DistributedShell's timeline related options are not clear
Vinod Kumar Vavilapalli created YARN-2772: - Summary: DistributedShell's timeline related options are not clear Key: YARN-2772 URL: https://issues.apache.org/jira/browse/YARN-2772 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen The new options domain and create options - they are not descriptive at all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188969#comment-14188969 ] Vinod Kumar Vavilapalli commented on YARN-2772: --- I propose the following: - Rename the domain and create options to be timeline_domain_id and should_create_timeline_domain respectively. - Modify option description of view_acls and modify_acls to say that they are only needed if should_create_timeline_domain is true - Modify description of {{timeline_domain_id}} to say that it is optional and the it will use the DEFAULT timeline-domain by default - If {{should_create_timeline_domain}} is off, we should validate on the client to see if the domain really exists or not and fail the submission if not with a message saying The passed timeline-domain doesn't exist. Either pass an existing timeline-domain_id or set should_create_timeline_domain to true. - If {{should_create_timeline_domain}} is on, and the user passes an existing timeline-domain-id, we should fail the submission and say The passed timeline-domain already exists. Either pass an new timeline-domain_id or set should_create_timeline_domain to false DistributedShell's timeline related options are not clear - Key: YARN-2772 URL: https://issues.apache.org/jira/browse/YARN-2772 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen The new options domain and create options - they are not descriptive at all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2766: Attachment: YARN-2766.patch That makes sense. I wasn't able to trace the code back to ApplicationHistoryManager, but I did find where the lists are created, so I put the sorting calls there. [JDK 8] TestApplicationHistoryClientService fails - Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189015#comment-14189015 ] Jason Lowe commented on YARN-2755: -- Thanks for the patch, Siqi. userDirStatus can be null if userDirPath is not a directory, so we should avoid the potential NPE and check for {{userDirStatus != null userDirStatus.hasNext()}} NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2755: -- Attachment: YARN-2755.v4.patch NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch, YARN-2755.v4.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189026#comment-14189026 ] Siqi Li commented on YARN-2755: --- Thanks for you feedback [~jlowe]. I have updated the patch with proper fix NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch, YARN-2755.v4.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2773) ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem
Anubhav Dhoot created YARN-2773: --- Summary: ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem Key: YARN-2773 URL: https://issues.apache.org/jira/browse/YARN-2773 Project: Hadoop YARN Issue Type: Bug Components: scheduler Reporter: Anubhav Dhoot Priority: Minor Reservation system requires use the ReservationDefinition to use a queue name to choose which reservation queue is being used. CapacityScheduler does not allow duplicate leaf queue names. Because of this we can refer to a unique leaf queue by simply using its name and not full path (which includes parentName + .). FairScheduler allows duplicate leaf queue names because of which one needs to refer to the full queue name to identify a queue uniquely. This is inconsistent for the implementation of the AbstractReservationSystem where one implementation of getQueuePath will do conversion (CapacityReservationSystem) while the FairReservationSystem will return the same value back -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189088#comment-14189088 ] Hadoop QA commented on YARN-2766: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678001/YARN-2766.patch against trunk revision d33e07d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 6 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5629//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5629//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5629//console This message is automatically generated. [JDK 8] TestApplicationHistoryClientService fails - Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
[ https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189101#comment-14189101 ] Hadoop QA commented on YARN-2755: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678008/YARN-2755.v4.patch against trunk revision d33e07d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5630//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5630//console This message is automatically generated. NM fails to clean up usercache_DEL_timestamp dirs after YARN-661 -- Key: YARN-2755 URL: https://issues.apache.org/jira/browse/YARN-2755 Project: Hadoop YARN Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, YARN-2755.v3.patch, YARN-2755.v4.patch When NM restarts frequently due to some reason, a large number of directories like these left in /data/disk$num/yarn/local/: /data/disk1/yarn/local/usercache_DEL_1414372756105 /data/disk1/yarn/local/usercache_DEL_1413557901696 /data/disk1/yarn/local/usercache_DEL_1413657004894 /data/disk1/yarn/local/usercache_DEL_1413675321860 /data/disk1/yarn/local/usercache_DEL_1414093167936 /data/disk1/yarn/local/usercache_DEL_1413565841271 These directories are empty, but take up 100M+ due to the number of them. There were 38714 on the machine I looked at per data disk. It appears to be a regression introduced by YARN-661 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chang li updated YARN-2556: --- Attachment: yarn2556.patch Cleaned up my patch, welcome to review. I have used this application to test the timeline server throughput on local mode by launching 4 mappers and each will put an entity larger than 100 kbs and iterate for 1000 times. Here is my measure result, on my local machine, the timeline server can provide about 10Mbs io rate for write. There is some deviation from the write throughput for leveldb. People are welcome to try this tool and comment about it. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor
[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189117#comment-14189117 ] Allen Wittenauer commented on YARN-2701: OK, this compiled without incident, so I'm +1 now. Thanks! Potential race condition in startLocalizer when using LinuxContainerExecutor -- Key: YARN-2701 URL: https://issues.apache.org/jira/browse/YARN-2701 Project: Hadoop YARN Issue Type: Bug Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.6.0 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, YARN-2701.addendum.3.patch, YARN-2701.addendum.4.patch When using LinuxContainerExecutor do startLocalizer, we are using native code container-executor.c. {code} if (stat(npath, sb) != 0) { if (mkdir(npath, perm) != 0) { {code} We are using check and create method to create the appDir under /usercache. But if there are two containers trying to do this at the same time, race condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server
[ https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189143#comment-14189143 ] Hadoop QA commented on YARN-2556: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678020/yarn2556.patch against trunk revision d33e07d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5631//console This message is automatically generated. Tool to measure the performance of the timeline server -- Key: YARN-2556 URL: https://issues.apache.org/jira/browse/YARN-2556 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Jonathan Eagles Assignee: chang li Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, yarn2556.patch, yarn2556_wip.patch We need to be able to understand the capacity model for the timeline server to give users the tools they need to deploy a timeline server with the correct capacity. I propose we create a mapreduce job that can measure timeline server write and read performance. Transactions per second, I/O for both read and write would be a good start. This could be done as an example or test job that could be tied into gridmix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated YARN-2766: Attachment: YARN-2766.patch New patch fixes findbugs warnings [JDK 8] TestApplicationHistoryClientService fails - Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM
[ https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2770: -- Attachment: YARN-2770.1.patch Created a patch: * Add two timeline client APIs - renew/cancel delegation token * Make TimelineDelegationTokenIdentifier.Renewer extend TokenRenewer and implement renew and cancel logic by using timeline client APIs * Change YarnClientImpl to set the renewer of the timeline DT to the user of RM daemon. * Add the test cases to validate renew/cancel APIs * Have done end-to-end test to verify that the automatic DT renew works in a secure cluster. Timeline delegation tokens need to be automatically renewed by the RM - Key: YARN-2770 URL: https://issues.apache.org/jira/browse/YARN-2770 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.5.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Attachments: YARN-2770.1.patch YarnClient will automatically grab a timeline DT for the application and pass it to the app AM. Now the timeline DT renew is still dummy. If an app is running for more than 24h (default DT expiry time), the app AM is no longer able to use the expired DT to communicate with the timeline server. Since RM will cache the credentials of each app, and renew the DTs for the running app. We should provider renew hooks similar to what HDFS DT has for RM, and set RM user as the renewer when grabbing the timeline DT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189194#comment-14189194 ] Zhijie Shen commented on YARN-2766: --- I think we need to change ApplicationContext - ApplicationHistoryManager - ApplicationHistoryManagerOnTimelineStore. Modifying the protobuf message will not help the web services. [JDK 8] TestApplicationHistoryClientService fails - Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2766: -- Issue Type: Bug (was: Sub-task) Parent: (was: YARN-1530) [JDK 8] TestApplicationHistoryClientService fails - Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189195#comment-14189195 ] Karthik Kambatla commented on YARN-2579: Thanks, [~rohithsharma]. Looking at the tests and your explanation, I think I see what you are saying. However, looking into the code, I am not convinced it is draining out that is causing this issue. {{rmDispatcher}} is an {{AsyncDispatcher}}, with {{drainEventsOnStop}} always false. So, {{rmDispatcher.stop()}} shouldn't lead to any draining of events. I noticed a couple of other issues in the AsyncDispatcher code: # {{eventHandlerThread.join}} in serviceStop should take a timeout as well # {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a try-catch block With the current patch, I wonder if there are any unexpected side-effects. Both RM's state is Active , but 1 RM is not really active. -- Key: YARN-2579 URL: https://issues.apache.org/jira/browse/YARN-2579 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Rohith Assignee: Rohith Attachments: YARN-2579.patch, YARN-2579.patch I encountered a situaltion where both RM's web page was able to access and its state displayed as Active. But One of the RM's ActiveServices were stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-2579: --- Priority: Blocker (was: Major) Target Version/s: 2.6.0 Both RM's state is Active , but 1 RM is not really active. -- Key: YARN-2579 URL: https://issues.apache.org/jira/browse/YARN-2579 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: YARN-2579.patch, YARN-2579.patch I encountered a situaltion where both RM's web page was able to access and its state displayed as Active. But One of the RM's ActiveServices were stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2766: -- Issue Type: Sub-task (was: Bug) Parent: YARN-321 [JDK 8] TestApplicationHistoryClientService fails - Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2766: -- Summary: ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers (was: [JDK 8] TestApplicationHistoryClientService fails) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers -- Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers
[ https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189247#comment-14189247 ] Hadoop QA commented on YARN-2766: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678034/YARN-2766.patch against trunk revision 3ae84e1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5632//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5632//console This message is automatically generated. ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers -- Key: YARN-2766 URL: https://issues.apache.org/jira/browse/YARN-2766 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Robert Kanter Assignee: Robert Kanter Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch {{TestApplicationHistoryClientService.testContainers}} and {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail because the test assertions are assuming a returned Collection is in a certain order. The collection comes from a HashMap, so the order is not guaranteed, plus, according to [this page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html], there are situations where the iteration order of a HashMap will be different between Java 7 and 8. We should fix the test code to not assume a specific ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named
[ https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2771: -- Attachment: YARN-2771.1.patch While I was aware of the bad naming, I decided to follow the pattern of the existing constants in DSConstants to be consistent. Anyway, I've uploaded a patch to fix all these constants. DS is not a serious computation framework, the env var name change is transparent to the CLI user, hence it should not breaking anything. DistributedShell's DSConstants are badly named -- Key: YARN-2771 URL: https://issues.apache.org/jira/browse/YARN-2771 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: YARN-2771.1.patch I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of DISTRIBUTEDSHELLTIMELINEDOMAIN). DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to be DISTRIBUTED_SHELL_TIMELINE_DOMAIN? For the old envs, we can just add new envs that point to the old-one and deprecate the old ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2186) Node Manager uploader service for cache manager
[ https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189307#comment-14189307 ] Karthik Kambatla commented on YARN-2186: Thanks Sangjin. Looks mostly good, but for some minor comments: # How about renaming NMUploaderSerivceSCMProtocol to SharedCacheUploader (after ResourceTracker) or SharedCacheUploaderProtocol? Accordingly, rename all other related classes and proto files? # Instead of {{yarn.sharedcache.nodemanager.}}, we should probably call it {{yarn.sharedcache.uploader}} to avoid confusion? # As per our offline discussions, it would be nice to add a way for the NM to ask the SCM whether it should upload a resource to the shared-cache or not. For now, this could be always yes. In the future, we can add a pluggable policy that the SCM would consult to answer the NM. # NMCacheUploaderSCMProtocolPBClientImpl#close should set {{this.proxy}} to null after calling stopProxy. # NMCacheUploaderSCMProtocolService: ## TODOs should have an associated follow-up JIRA and reference in the code so we don't forget ## serviceStop should set {{this.server}} to null after calling {{this.server.stop()}} Node Manager uploader service for cache manager --- Key: YARN-2186 URL: https://issues.apache.org/jira/browse/YARN-2186 Project: Hadoop YARN Issue Type: Sub-task Reporter: Chris Trezzo Assignee: Chris Trezzo Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch Implement the node manager uploader service for the cache manager. This service is responsible for communicating with the node manager when it uploads resources to the shared cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
[ https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189344#comment-14189344 ] Karthik Kambatla commented on YARN-2588: Thanks Jian for pointing me to this. Patch fixes an important issue, but I would like for us to call transitionToStandby in the catch-block instead of explicitly calling the contents of transitionToStandby. I ll fix this up in YARN-2010. Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception. -- Key: YARN-2588 URL: https://issues.apache.org/jira/browse/YARN-2588 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.6.0, 2.5.1 Reporter: Rohith Assignee: Rohith Fix For: 2.6.0 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch Consider scenario where, StandBy RM is failed to transition to Active because of ZK exception(connectionLoss or SessionExpired). Then any further transition to Active for same RM does not move RM to Active state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189384#comment-14189384 ] Zhijie Shen commented on YARN-2772: --- [~vinodkv], thanks for your proposal. 1. I prefer create_timeline_domain over should_create_timeline_domain, as it is an option without arg. So there will not be true/false for it. 2. I'd like to enforce the validation logic (see the existing code comment). However, as we're lacking timeline client query APIs. It will involve more steps to send http requests and parse JSON response. I prefer to do it after YARN-2423. {code} try { //TODO: we need to check and combine the existing timeline domain ACLs, //but let's do it once we have client java library to query domains. TimelineDomain domain = new TimelineDomain(); {code} Otherwise, I've addressed the other comments and made a patch. DistributedShell's timeline related options are not clear - Key: YARN-2772 URL: https://issues.apache.org/jira/browse/YARN-2772 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen The new options domain and create options - they are not descriptive at all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2772: -- Attachment: YARN-2772.1.patch DistributedShell's timeline related options are not clear - Key: YARN-2772 URL: https://issues.apache.org/jira/browse/YARN-2772 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: YARN-2772.1.patch The new options domain and create options - they are not descriptive at all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2774) shared cache uploader service should authorize notify calls properly
Sangjin Lee created YARN-2774: - Summary: shared cache uploader service should authorize notify calls properly Key: YARN-2774 URL: https://issues.apache.org/jira/browse/YARN-2774 Project: Hadoop YARN Issue Type: Task Reporter: Sangjin Lee The shared cache manager (SCM) uploader service (done in YARN-2186) currently does not authorize calls to notify the SCM on newly uploaded resource. Proper security/authorization needs to be done in this RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2774) shared cache uploader service should authorize notify calls properly
[ https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sangjin Lee updated YARN-2774: -- Issue Type: Sub-task (was: Task) Parent: YARN-1492 shared cache uploader service should authorize notify calls properly Key: YARN-2774 URL: https://issues.apache.org/jira/browse/YARN-2774 Project: Hadoop YARN Issue Type: Sub-task Reporter: Sangjin Lee The shared cache manager (SCM) uploader service (done in YARN-2186) currently does not authorize calls to notify the SCM on newly uploaded resource. Proper security/authorization needs to be done in this RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node
[ https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter reassigned YARN-2604: --- Assignee: Robert Kanter (was: Karthik Kambatla) Scheduler should consider max-allocation-* in conjunction with the largest node --- Key: YARN-2604 URL: https://issues.apache.org/jira/browse/YARN-2604 Project: Hadoop YARN Issue Type: Improvement Components: scheduler Affects Versions: 2.5.1 Reporter: Karthik Kambatla Assignee: Robert Kanter If the scheduler max-allocation-* values are larger than the resources available on the largest node in the cluster, an application requesting resources between the two values will be accepted by the scheduler but the requests will never be satisfied. The app essentially hangs forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI
[ https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189488#comment-14189488 ] Hadoop QA commented on YARN-2698: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677981/YARN-2698-20141029-2.patch against trunk revision 6f5f604. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter org.apache.hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities org.apache.hadoop.mapreduce.TestMapReduceLazyOutput org.apache.hadoop.mapreduce.v2.TestNonExistentJob org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner org.apache.hadoop.mapreduce.v2.TestUberAM org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler org.apache.hadoop.mapreduce.v2.TestMRJobs org.apache.hadoop.mapreduce.v2.TestRMNMInfo org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution org.apache.hadoop.mapreduce.v2.TestMROldApiJobs org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService org.apache.hadoop.mapreduce.TestLargeSort org.apache.hadoop.mapred.TestClusterMRNotification The test build failed in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5634//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-YARN-Build/5634//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5634//console This message is automatically generated. Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI --- Key: YARN-2698 URL: https://issues.apache.org/jira/browse/YARN-2698 Project: Hadoop YARN Issue Type: Sub-task Components: api, client, resourcemanager Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, YARN-2698-20141029-2.patch YARN RMAdminCLI and AdminService should have write API only, for other read APIs, they should be located at YARNCLI and RMClientService. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.
[ https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189583#comment-14189583 ] Rohith commented on YARN-2579: -- Thanks Karthink!! bq. (Service)Dispatcher.stop() wait for draining out RMFatalEventDispatcher event I was meant to say that drained event i.e RMFatalEvent is been waiting to be finished at {{rmDispatcher.stop()}} in {{eventHandlerThread.join}}. bq. {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a try-catch block {{dispatch(event)}} method catch throwable and exit the JVM. But I see if handler's are not registered , then we must have try-catch block. do you meant for this scenario? bq. {{eventHandlerThread.join}} in serviceStop should take a timeout as well +1 for this approach too, this also fixes hang problem. The attached patch too does not bring Rm to hang in a kind of deadlock mode. bq. With the current patch, I wonder if there are any unexpected side-effects I have verified many switching scenarios as I mentioned in previous comment and more deployed in real cluster. It is working fine with work preserving restart too. Both RM's state is Active , but 1 RM is not really active. -- Key: YARN-2579 URL: https://issues.apache.org/jira/browse/YARN-2579 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.5.1 Reporter: Rohith Assignee: Rohith Priority: Blocker Attachments: YARN-2579.patch, YARN-2579.patch I encountered a situaltion where both RM's web page was able to access and its state displayed as Active. But One of the RM's ActiveServices were stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.
[ https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189600#comment-14189600 ] Rohith commented on YARN-2588: -- bq. but I would like for us to call transitionToStandby in the catch-block instead of explicitly calling the contents of transitionToStandby As I understand the comment, Is expecting change is like below..? CMIIAW, If yes, transitionToStandby return in intial state check itself. And end up in without creating active services and resetting dispatcher!!! {code} try { startActiveServices(); return null; } catch (Exception e) { transitionToStandby(true); throw e; } {code} Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception. -- Key: YARN-2588 URL: https://issues.apache.org/jira/browse/YARN-2588 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 3.0.0, 2.6.0, 2.5.1 Reporter: Rohith Assignee: Rohith Fix For: 2.6.0 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch Consider scenario where, StandBy RM is failed to transition to Active because of ZK exception(connectionLoss or SessionExpired). Then any further transition to Active for same RM does not move RM to Active state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear
[ https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189652#comment-14189652 ] Hadoop QA commented on YARN-2772: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678074/YARN-2772.1.patch against trunk revision 0126cf1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5636//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5636//console This message is automatically generated. DistributedShell's timeline related options are not clear - Key: YARN-2772 URL: https://issues.apache.org/jira/browse/YARN-2772 Project: Hadoop YARN Issue Type: Bug Components: applications/distributed-shell Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: YARN-2772.1.patch The new options domain and create options - they are not descriptive at all. It is also not clear when view_acls and modify_acls need to be set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: (was: YARN-2753.005.patch) Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, YARN-2753.005.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager
[ https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2753: Attachment: YARN-2753.005.patch Fix potential issues and code clean up for *NodeLabelsManager - Key: YARN-2753 URL: https://issues.apache.org/jira/browse/YARN-2753 Project: Hadoop YARN Issue Type: Sub-task Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2753.000.patch, YARN-2753.001.patch, YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, YARN-2753.005.patch Issues include: * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value in labelCollections if the key already exists otherwise the Label.resource will be changed(reset). * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of CommonNodeLabelsManager. ** because when a Node is created, Node.labels can be null. ** In this case, nm.labels; may be null. So we need check originalLabels not null before use it(originalLabels.containsAll). * addToCluserNodeLabels should be protected by writeLock in RMNodeLabelsManager.java. because we should protect labelCollections in RMNodeLabelsManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM
[ https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189657#comment-14189657 ] Hadoop QA commented on YARN-2770: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678042/YARN-2770.1.patch against trunk revision 0126cf1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5635//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5635//console This message is automatically generated. Timeline delegation tokens need to be automatically renewed by the RM - Key: YARN-2770 URL: https://issues.apache.org/jira/browse/YARN-2770 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.5.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical Attachments: YARN-2770.1.patch YarnClient will automatically grab a timeline DT for the application and pass it to the app AM. Now the timeline DT renew is still dummy. If an app is running for more than 24h (default DT expiry time), the app AM is no longer able to use the expired DT to communicate with the timeline server. Since RM will cache the credentials of each app, and renew the DTs for the running app. We should provider renew hooks similar to what HDFS DT has for RM, and set RM user as the renewer when grabbing the timeline DT. -- This message was sent by Atlassian JIRA (v6.3.4#6332)