[jira] [Created] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-29 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-2767:
---

 Summary: RM web services - add test case to ensure the http static 
user can kill or submit apps in secure mode
 Key: YARN-2767
 URL: https://issues.apache.org/jira/browse/YARN-2767
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev


We should add a test to ensure that the http static user used to access the RM 
web interface can't submit or kill apps if the cluster is running in secure 
mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2767:

Attachment: apache-yarn-2767.0.patch

Uploaded patch with new test case.

 RM web services - add test case to ensure the http static user can kill or 
 submit apps in secure mode
 -

 Key: YARN-2767
 URL: https://issues.apache.org/jira/browse/YARN-2767
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2767.0.patch


 We should add a test to ensure that the http static user used to access the 
 RM web interface can't submit or kill apps if the cluster is running in 
 secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2761) potential race condition in SchedulingPolicy

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188126#comment-14188126
 ] 

Hadoop QA commented on YARN-2761:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677844/YARN-2761.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5622//console

This message is automatically generated.

 potential race condition in SchedulingPolicy
 

 Key: YARN-2761
 URL: https://issues.apache.org/jira/browse/YARN-2761
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2761.patch


 reported by findbug. 
 In SchedulingPolicy.getInstance, ConcurrentHashMap.get and 
 ConcurrentHashMap.put is called. These two operations together should be 
 atomic, but using ConcurrentHashMap doesn't guarantee this. 
 {code} 
 public static SchedulingPolicy getInstance(Class? extends  SchedulingPolicy 
 clazz) { 
   SchedulingPolicy policy = instances.get(clazz); 
   if (policy == null) { 
 policy = ReflectionUtils.newInstance(clazz, null); 
instances.put(clazz, policy); 
   } 
   return policy; 
 } 
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188125#comment-14188125
 ] 

Hadoop QA commented on YARN-2698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677821/YARN-2698-20141028-3.patch
  against trunk revision 3c5f5af.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/5619//artifact/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestMRTimelineEventHandling
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5619//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5619//console

This message is automatically generated.

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch


 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188127#comment-14188127
 ] 

Hadoop QA commented on YARN-2767:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677839/apache-yarn-2767.0.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5621//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5621//console

This message is automatically generated.

 RM web services - add test case to ensure the http static user can kill or 
 submit apps in secure mode
 -

 Key: YARN-2767
 URL: https://issues.apache.org/jira/browse/YARN-2767
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2767.0.patch


 We should add a test to ensure that the http static user used to access the 
 RM web interface can't submit or kill apps if the cluster is running in 
 secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2767:

Attachment: apache-yarn-2767.1.patch

Uploaded a new patch with some variable names fixed.

 RM web services - add test case to ensure the http static user can kill or 
 submit apps in secure mode
 -

 Key: YARN-2767
 URL: https://issues.apache.org/jira/browse/YARN-2767
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch


 We should add a test to ensure that the http static user used to access the 
 RM web interface can't submit or kill apps if the cluster is running in 
 secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)
Hong Zhiguo created YARN-2768:
-

 Summary: optimize FSAppAttempt.updateDemand by avoid clone of 
Resource which takes 85% of computing time of update thread
 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor


See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.**multiply**(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(**clone**(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Attachment: profiling_FairScheduler_update.png

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Description: 
See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.multiply(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(clone(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.

  was:
See the attached picture of profiling result. The clone of Resource object 
within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
function FairScheduler.update().

The code of FSAppAttempt.updateDemand:
{code}
public void updateDemand() {
demand = Resources.createResource(0);
// Demand is current consumption plus outstanding requests
Resources.addTo(demand, app.getCurrentConsumption());

// Add up outstanding resource requests
synchronized (app) {
  for (Priority p : app.getPriorities()) {
for (ResourceRequest r : app.getResourceRequests(p).values()) {
  Resource total = Resources.**multiply**(r.getCapability(), 
r.getNumContainers());
  Resources.addTo(demand, total);
}
  }
}
  }
{code}

The code of Resources.multiply:
{code}
public static Resource multiply(Resource lhs, double by) {
return multiplyTo(**clone**(lhs), by);
}
{code}

The clone could be skipped by directly update the value of this.demand.


 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188146#comment-14188146
 ] 

Hadoop QA commented on YARN-2768:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677853/profiling_FairScheduler_update.png
  against trunk revision ec63a3f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5624//console

This message is automatically generated.

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2768:
--
Attachment: YARN-2768.patch

Avoid the clone by adding a ternary operator Resources.multiplyAndAddTo.
After this optimization, the average time costed by FairScheduler.update (a 
TestCase with 10k apps) is reduced 40%.

I'm not sure whether it's better to have such test cases also submitted.

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2767) RM web services - add test case to ensure the http static user can kill or submit apps in secure mode

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188182#comment-14188182
 ] 

Hadoop QA commented on YARN-2767:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677848/apache-yarn-2767.1.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5623//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5623//console

This message is automatically generated.

 RM web services - add test case to ensure the http static user can kill or 
 submit apps in secure mode
 -

 Key: YARN-2767
 URL: https://issues.apache.org/jira/browse/YARN-2767
 Project: Hadoop YARN
  Issue Type: Test
  Components: resourcemanager
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2767.0.patch, apache-yarn-2767.1.patch


 We should add a test to ensure that the http static user used to access the 
 RM web interface can't submit or kill apps if the cluster is running in 
 secure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Yogesh Sobale (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188188#comment-14188188
 ] 

Yogesh Sobale commented on YARN-1902:
-

Can some please update ?

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Yogesh Sobale (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188190#comment-14188190
 ] 

Yogesh Sobale commented on YARN-1902:
-

Can someone please update ?

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2768) optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% of computing time of update thread

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188222#comment-14188222
 ] 

Hadoop QA commented on YARN-2768:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677855/YARN-2768.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5625//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5625//console

This message is automatically generated.

 optimize FSAppAttempt.updateDemand by avoid clone of Resource which takes 85% 
 of computing time of update thread
 

 Key: YARN-2768
 URL: https://issues.apache.org/jira/browse/YARN-2768
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-2768.patch, profiling_FairScheduler_update.png


 See the attached picture of profiling result. The clone of Resource object 
 within Resources.multiply() takes up **85%** (19.2 / 22.6) CPU time of the 
 function FairScheduler.update().
 The code of FSAppAttempt.updateDemand:
 {code}
 public void updateDemand() {
 demand = Resources.createResource(0);
 // Demand is current consumption plus outstanding requests
 Resources.addTo(demand, app.getCurrentConsumption());
 // Add up outstanding resource requests
 synchronized (app) {
   for (Priority p : app.getPriorities()) {
 for (ResourceRequest r : app.getResourceRequests(p).values()) {
   Resource total = Resources.multiply(r.getCapability(), 
 r.getNumContainers());
   Resources.addTo(demand, total);
 }
   }
 }
   }
 {code}
 The code of Resources.multiply:
 {code}
 public static Resource multiply(Resource lhs, double by) {
 return multiplyTo(clone(lhs), by);
 }
 {code}
 The clone could be skipped by directly update the value of this.demand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188266#comment-14188266
 ] 

Hudson commented on YARN-2758:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2758. Update TestApplicationHistoryClientService to use the new generic 
history store. Contributed by Zhijie Shen (xgong: rev 
69f79bee8b3da07bf42e22e35e58c7719782e31f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java


 Update TestApplicationHistoryClientService to use the new generic history 
 store
 ---

 Key: YARN-2758
 URL: https://issues.apache.org/jira/browse/YARN-2758
 Project: Hadoop YARN
  Issue Type: Test
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2758.1.patch


 TestApplicationHistoryClientService is still testing against the mock data in 
 the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188270#comment-14188270
 ] 

Hudson commented on YARN-2747:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is 
enabled. Contributed by Xuan Gong. (zjshen: rev 
ec63a3ffbd9413e7434594682fdbbd36eef7413c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt


 TestAggregatedLogFormat fails in trunk
 --

 Key: YARN-2747
 URL: https://issues.apache.org/jira/browse/YARN-2747
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2747.1.patch


 Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
 Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec  
 FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
 testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat)
   Time elapsed: 0.047 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188275#comment-14188275
 ] 

Hudson commented on YARN-2741:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. 
Contributed by Craig Welch. (zjshen: rev 
8984e9b1774033e379b57da1bd30a5c81888c7a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java


 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2741.1.patch, YARN-2741.6.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188277#comment-14188277
 ] 

Hudson commented on YARN-2760:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) 
(kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Completely remove word 'experimental' from FairScheduler docs
 -

 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 2.6.0

 Attachments: YARN-2760.patch, YARN-2760.patch


 After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
 use, but the doc change done in that did not entirely cover removal of that 
 word, leaving a remnant in the preemption sub-point. This needs to be removed 
 as well, as the feature has been good to use for a long time now, and is not 
 experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188265#comment-14188265
 ] 

Hudson commented on YARN-2503:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #727 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/727/])
YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev 
d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
Missing CHANGES.txt for YARN-2503. (jianhe: rev 
0782f602881272392381486bcc749850f96acd22)
* hadoop-yarn-project/CHANGES.txt


 Changes in RM Web UI to better show labels to end users
 ---

 Key: YARN-2503
 URL: https://issues.apache.org/jira/browse/YARN-2503
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, 
 YARN-2503.patch


 Include but not limited to:
 - Show labels of nodes in RM/nodes page
 - Show labels of queue in RM/scheduler page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188336#comment-14188336
 ] 

Hudson commented on YARN-2747:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is 
enabled. Contributed by Xuan Gong. (zjshen: rev 
ec63a3ffbd9413e7434594682fdbbd36eef7413c)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java


 TestAggregatedLogFormat fails in trunk
 --

 Key: YARN-2747
 URL: https://issues.apache.org/jira/browse/YARN-2747
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2747.1.patch


 Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
 Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec  
 FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
 testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat)
   Time elapsed: 0.047 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188331#comment-14188331
 ] 

Hudson commented on YARN-2503:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev 
d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java
Missing CHANGES.txt for YARN-2503. (jianhe: rev 
0782f602881272392381486bcc749850f96acd22)
* hadoop-yarn-project/CHANGES.txt


 Changes in RM Web UI to better show labels to end users
 ---

 Key: YARN-2503
 URL: https://issues.apache.org/jira/browse/YARN-2503
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, 
 YARN-2503.patch


 Include but not limited to:
 - Show labels of nodes in RM/nodes page
 - Show labels of queue in RM/scheduler page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188332#comment-14188332
 ] 

Hudson commented on YARN-2758:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2758. Update TestApplicationHistoryClientService to use the new generic 
history store. Contributed by Zhijie Shen (xgong: rev 
69f79bee8b3da07bf42e22e35e58c7719782e31f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java
* hadoop-yarn-project/CHANGES.txt


 Update TestApplicationHistoryClientService to use the new generic history 
 store
 ---

 Key: YARN-2758
 URL: https://issues.apache.org/jira/browse/YARN-2758
 Project: Hadoop YARN
  Issue Type: Test
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2758.1.patch


 TestApplicationHistoryClientService is still testing against the mock data in 
 the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188341#comment-14188341
 ] 

Hudson commented on YARN-2741:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. 
Contributed by Craig Welch. (zjshen: rev 
8984e9b1774033e379b57da1bd30a5c81888c7a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt


 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2741.1.patch, YARN-2741.6.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188343#comment-14188343
 ] 

Hudson commented on YARN-2760:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1941 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1941/])
YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) 
(kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Completely remove word 'experimental' from FairScheduler docs
 -

 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 2.6.0

 Attachments: YARN-2760.patch, YARN-2760.patch


 After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
 use, but the doc change done in that did not entirely cover removal of that 
 word, leaving a remnant in the preemption sub-point. This needs to be removed 
 as well, as the feature has been good to use for a long time now, and is not 
 experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2503) Changes in RM Web UI to better show labels to end users

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188374#comment-14188374
 ] 

Hudson commented on YARN-2503:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2503. Added node lablels in web UI. Contributed by Wangda Tan (jianhe: rev 
d5e0a09721a5156fa2ee51ac1c32fbfd9905b8fb)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/CapacitySchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/CapacitySchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/NodeInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestNodesPage.java
Missing CHANGES.txt for YARN-2503. (jianhe: rev 
0782f602881272392381486bcc749850f96acd22)
* hadoop-yarn-project/CHANGES.txt


 Changes in RM Web UI to better show labels to end users
 ---

 Key: YARN-2503
 URL: https://issues.apache.org/jira/browse/YARN-2503
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.6.0

 Attachments: YARN-2503-20141022-1.patch, YARN-2503-20141028-1.patch, 
 YARN-2503.patch


 Include but not limited to:
 - Show labels of nodes in RM/nodes page
 - Show labels of queue in RM/scheduler page



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2760) Completely remove word 'experimental' from FairScheduler docs

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188386#comment-14188386
 ] 

Hudson commented on YARN-2760:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2760. Remove 'experimental' from FairScheduler docs. (Harsh J via kasha) 
(kasha: rev ade3727ecb092935dcc0f1291c1e6cf43d764a03)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/FairScheduler.apt.vm
* hadoop-yarn-project/CHANGES.txt


 Completely remove word 'experimental' from FairScheduler docs
 -

 Key: YARN-2760
 URL: https://issues.apache.org/jira/browse/YARN-2760
 Project: Hadoop YARN
  Issue Type: Bug
  Components: documentation
Affects Versions: 2.1.0-beta
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 2.6.0

 Attachments: YARN-2760.patch, YARN-2760.patch


 After YARN-1034, FairScheduler has not been 'experimental' in any aspect of 
 use, but the doc change done in that did not entirely cover removal of that 
 word, leaving a remnant in the preemption sub-point. This needs to be removed 
 as well, as the feature has been good to use for a long time now, and is not 
 experimental.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2741) Windows: Node manager cannot serve up log files via the web user interface when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the drive that nodemanag

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188384#comment-14188384
 ] 

Hudson commented on YARN-2741:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2741. Made NM web UI serve logs on the drive other than C: on Windows. 
Contributed by Craig Welch. (zjshen: rev 
8984e9b1774033e379b57da1bd30a5c81888c7a3)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/webapp/ContainerLogsUtils.java


 Windows: Node manager cannot serve up log files via the web user interface 
 when yarn.nodemanager.log-dirs to any drive letter other than C: (or, the 
 drive that nodemanager is running on)
 --

 Key: YARN-2741
 URL: https://issues.apache.org/jira/browse/YARN-2741
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
 Environment: Windows
Reporter: Craig Welch
Assignee: Craig Welch
 Fix For: 2.6.0

 Attachments: YARN-2741.1.patch, YARN-2741.6.patch


 PROBLEM: User is getting No Logs available for Container Container_number 
 when setting the yarn.nodemanager.log-dirs to any drive letter other than C:
 STEPS TO REPRODUCE:
 On Windows
 1) Run NodeManager on C:
 2) Create two local drive partitions D: and E:
 3) Put yarn.nodemanager.log-dirs = D:\nmlogs or E:\nmlogs
 4) Run a MR job that will last at least 5 minutes
 5) While the job is in flight, log into the Yarn web ui , 
 resource_manager_server:8088/cluster
 6) Click on the application_idnumber
 7) Click on the logs link, you will get No Logs available for Container 
 Container_number
 ACTUAL BEHAVIOR: Getting an error message when viewing the container logs
 EXPECTED BEHAVIOR: Able to use different drive letters in 
 yarn.nodemanager.log-dirs and not get error
 NOTE: If we use the drive letter C: in yarn.nodemanager.log-dirs, we are able 
 to see the container logs while the MR job is in flight.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2747) TestAggregatedLogFormat fails in trunk

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188379#comment-14188379
 ] 

Hudson commented on YARN-2747:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2747. Fixed the test failure of TestAggregatedLogFormat when native I/O is 
enabled. Contributed by Xuan Gong. (zjshen: rev 
ec63a3ffbd9413e7434594682fdbbd36eef7413c)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/logaggregation/TestAggregatedLogFormat.java
* hadoop-yarn-project/CHANGES.txt


 TestAggregatedLogFormat fails in trunk
 --

 Key: YARN-2747
 URL: https://issues.apache.org/jira/browse/YARN-2747
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Xuan Gong
Assignee: Xuan Gong
 Fix For: 2.6.0

 Attachments: YARN-2747.1.patch


 Running org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
 Tests run: 3, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.105 sec  
 FAILURE! - in org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat
 testContainerLogsFileAccess(org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat)
   Time elapsed: 0.047 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.logaggregation.TestAggregatedLogFormat.testContainerLogsFileAccess(TestAggregatedLogFormat.java:346)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2758) Update TestApplicationHistoryClientService to use the new generic history store

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188375#comment-14188375
 ] 

Hudson commented on YARN-2758:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1916 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1916/])
YARN-2758. Update TestApplicationHistoryClientService to use the new generic 
history store. Contributed by Zhijie Shen (xgong: rev 
69f79bee8b3da07bf42e22e35e58c7719782e31f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice/src/test/java/org/apache/hadoop/yarn/server/applicationhistoryservice/TestApplicationHistoryClientService.java


 Update TestApplicationHistoryClientService to use the new generic history 
 store
 ---

 Key: YARN-2758
 URL: https://issues.apache.org/jira/browse/YARN-2758
 Project: Hadoop YARN
  Issue Type: Test
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Fix For: 2.6.0

 Attachments: YARN-2758.1.patch


 TestApplicationHistoryClientService is still testing against the mock data in 
 the old MemoryApplicationHistoryStore. hence it needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)
Varun Vasudev created YARN-2769:
---

 Summary: TestDistributedShell#testDSShell fails on Windows
 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Description: 
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec  
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec   FAILURE!
org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)

 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev

 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188465#comment-14188465
 ] 

Varun Vasudev commented on YARN-2769:
-

Attached fix.

 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Attachment: apache-yarn-2769.0.patch

 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188464#comment-14188464
 ] 

Varun Vasudev commented on YARN-2769:
-

Since we use shell_command in the test,
{noformat}
if (envs.containsKey(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION)) {
{noformat}
is false on Windows(but true on Linux). Just moving the domain id setting out 
of this if-condition fixes the bug.

 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Description: 
{noformat}
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec  
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec   FAILURE!
org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
{noformat}

  was:
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec  
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec   FAILURE!
org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)


 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2711) TestDefaultContainerExecutor#testContainerLaunchError fails on Windows

2014-10-29 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188472#comment-14188472
 ] 

Junping Du commented on YARN-2711:
--

Thanks [~vvasudev] for the patch and [~cwelch] for review! 
Patch looks good to me. Will commit it shortly.

 TestDefaultContainerExecutor#testContainerLaunchError fails on Windows
 --

 Key: YARN-2711
 URL: https://issues.apache.org/jira/browse/YARN-2711
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2711.0.patch


 The testContainerLaunchError test fails on Windows with the following error -
 {noformat}
 java.io.FileNotFoundException: File file:/bin/echo does not exist
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:524)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:737)
   at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:514)
   at 
 org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
   at org.apache.hadoop.fs.FilterFs.getFileStatus(FilterFs.java:120)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1117)
   at org.apache.hadoop.fs.FileContext$14.next(FileContext.java:1113)
   at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
   at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1113)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2019)
   at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1978)
   at 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:145)
   at 
 org.apache.hadoop.yarn.server.nodemanager.TestDefaultContainerExecutor.testContainerLaunchError(TestDefaultContainerExecutor.java:289)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-2765:
-
Attachment: YARN-2765v2.patch

Thanks for the review, Tsuyoshi!

bq. How about adding helper methods like getKeyPrefix/getNodePath for getting 
key prefix and node path?

Sure, added some helper methods to compute leveldb keys for various things.

bq. I found that the patch includes lots hard-coded /. I think it's better to 
have private field SEPARATOR = /. 

IMHO this makes the code less readable, similar to a code style like {{final 
int ONE = 1}}.  But I don't care too strongly about it and changed all 
occurrences to SEPARATOR.

For Zhijie's comments:

bq. One drawback I can think of is that while LeveldbRMStateStore is 
lightweight for single RM restarting, multiple RMs of HA are not able to share 
this single-host database.

This should work if the leveldb database is on a network store like a filer.  
Leveldb uses locks to prevent multiple processes from trying to access the 
database simultaneously, so there's a little bit of help for the fencing 
scenarios.  However the fencing script actions would have to do some extra work 
to force a poorly-behaving resourcemanager to let go of the locks so a standby 
RM can open the store and become active.

bq. Did you have a chance to think of an enhanced k/v db: rocksdb?

I briefly considered using rocksdb for this but decided against it for a couple 
of reasons:

* leveldb is already used by the timeline server and nodemanager, and I would 
rather avoid adding yet another new dependency for this
* leveldb supports win32/win64, but it doesn't appear that the standard 
rocksdbjni distribution has support for Windows.

 Add leveldb-based implementation for RMStateStore
 -

 Key: YARN-2765
 URL: https://issues.apache.org/jira/browse/YARN-2765
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2765.patch, YARN-2765v2.patch


 It would be nice to have a leveldb option to the resourcemanager recovery 
 store. Leveldb would provide some benefits over the existing filesystem store 
 such as better support for atomic operations, fewer I/O ops per state update, 
 and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1902) Allocation of too many containers when a second request is done with the same resource capability

2014-10-29 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188516#comment-14188516
 ] 

Bikas Saha commented on YARN-1902:
--

bq. Given a ContainerRequest x with Resource y, when addContainerRequest is 
called z times with x, allocate is called and at least one of the z allocated 
containers is started, then if another addContainerRequest call is done and 
subsequently an allocate call to the RM, (z+1) containers will be allocated, 
where 1 container is expected.

Firstly, I am not sure if the same ContainerRequest object can be passed 
multiple times in addContainerRequest. It should be different objects each time 
(even if they point to the same resource). This might have something to do with 
the internal book-keeping done for matching requests.

Secondly, after z requests are made and 1 allocation is received then z-1 
requests remain. If you are using AMRMClientImpl then its your (users) 
responsibility to call removeContainerRequest() for the request that was 
matched to this container. The AMRMClient does not know which of your z 
requests could be assigned to this container. So in the general case, it cannot 
automatically remove a request from the internal table because it does not know 
which request to remove. If the javadocs dont clarify these semantics then we 
can improve the javadocs.

Thirdly, the protocol between the AMRMClient and the RM has an inherent race. 
So if the client had earlier asked for z containers and in the next heartbeat 
reduces that to z-1, the RM may actually return z containers to it because it 
had already allocated them to this client before the client updated the RM with 
the new value.

 Allocation of too many containers when a second request is done with the same 
 resource capability
 -

 Key: YARN-1902
 URL: https://issues.apache.org/jira/browse/YARN-1902
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 2.2.0, 2.3.0, 2.4.0
Reporter: Sietse T. Au
  Labels: client
 Attachments: YARN-1902.patch, YARN-1902.v2.patch, YARN-1902.v3.patch


 Regarding AMRMClientImpl
 Scenario 1:
 Given a ContainerRequest x with Resource y, when addContainerRequest is 
 called z times with x, allocate is called and at least one of the z allocated 
 containers is started, then if another addContainerRequest call is done and 
 subsequently an allocate call to the RM, (z+1) containers will be allocated, 
 where 1 container is expected.
 Scenario 2:
 No containers are started between the allocate calls. 
 Analyzing debug logs of the AMRMClientImpl, I have found that indeed a (z+1) 
 are requested in both scenarios, but that only in the second scenario, the 
 correct behavior is observed.
 Looking at the implementation I have found that this (z+1) request is caused 
 by the structure of the remoteRequestsTable. The consequence of MapResource, 
 ResourceRequestInfo is that ResourceRequestInfo does not hold any 
 information about whether a request has been sent to the RM yet or not.
 There are workarounds for this, such as releasing the excess containers 
 received.
 The solution implemented is to initialize a new ResourceRequest in 
 ResourceRequestInfo when a request has been successfully sent to the RM.
 The patch includes a test in which scenario one is tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188526#comment-14188526
 ] 

Hadoop QA commented on YARN-2769:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677908/apache-yarn-2769.0.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5627//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5627//console

This message is automatically generated.

 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Test
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2769) TestDistributedShell#testDSShell fails on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188530#comment-14188530
 ] 

Varun Vasudev commented on YARN-2769:
-

I haven't included any test since this is a fix for a test failing on Windows.

 TestDistributedShell#testDSShell fails on Windows
 -

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Test
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2698:
-
Attachment: YARN-2698-20141029-1.patch

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch


 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Summary: Timeline server domain not set correctly when using shell_command 
on Windows  (was: TestDistributedShell#testDSShell fails on Windows)

 Timeline server domain not set correctly when using shell_command on Windows
 

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Test
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Issue Type: Bug  (was: Test)

 Timeline server domain not set correctly when using shell_command on Windows
 

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188589#comment-14188589
 ] 

Hadoop QA commented on YARN-2765:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12677911/YARN-2765v2.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5626//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5626//console

This message is automatically generated.

 Add leveldb-based implementation for RMStateStore
 -

 Key: YARN-2765
 URL: https://issues.apache.org/jira/browse/YARN-2765
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2765.patch, YARN-2765v2.patch


 It would be nice to have a leveldb option to the resourcemanager recovery 
 store. Leveldb would provide some benefits over the existing filesystem store 
 such as better support for atomic operations, fewer I/O ops per state update, 
 and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2769:

Description: 
The bug is caught by one of the unit tests which fails.
{noformat}
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec  
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec   FAILURE!
org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
{noformat}

  was:
{noformat}
Running 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec  
FAILURE! - in org.apache.hadoop.yarn.applications.distribut
testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
  Time elapsed: 37.366 sec   FAILURE!
org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
at 
org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
{noformat}


 Timeline server domain not set correctly when using shell_command on Windows
 

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 The bug is caught by one of the unit tests which fails.
 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188595#comment-14188595
 ] 

Zhijie Shen commented on YARN-2765:
---

bq. This should work if the leveldb database is on a network store like a filer.

Thanks for sharing. This is an interesting use case that I'm not aware of 
before.

bq. I briefly considered using rocksdb for this but decided against it for a 
couple of reasons:

It's not particularly related to this Jira, but I just want to think it out 
loudly. It seems that rocksdb claims to have better performance in terms of I/O 
than leveldb, while their APIs are very similar to each other. After we have 
the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. 
Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the 
timeline server may want a stronger one. Rocksdb may be a compromise before 
migrating to fully distributed storage solution based on HBase. And one other 
merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm 
wrong, but it seems that rocksdb can also help to scale out the storage problem 
as well as support RM HA deployment in a shared nothing environment (e.g. 
without a network storage).

I'm not saying we should go with rocksdb now instead of leveldb, as we know it 
has been used for other components already. I'm trying to propose if we can 
think of rocksdb, which looks stronger but still reasonably simple alternate.

 Add leveldb-based implementation for RMStateStore
 -

 Key: YARN-2765
 URL: https://issues.apache.org/jira/browse/YARN-2765
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2765.patch, YARN-2765v2.patch


 It would be nice to have a leveldb option to the resourcemanager recovery 
 store. Leveldb would provide some benefits over the existing filesystem store 
 such as better support for atomic operations, fewer I/O ops per state update, 
 and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2770:
-

 Summary: Timeline delegation tokens need to be automatically 
renewed by the RM
 Key: YARN-2770
 URL: https://issues.apache.org/jira/browse/YARN-2770
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


YarnClient will automatically grab a timeline DT for the application and pass 
it to the app AM. Now the timeline DT renew is still dummy. If an app is 
running for more than 24h (default DT expiry time), the app AM is no longer 
able to use the expired DT to communicate with the timeline server. Since RM 
will cache the credentials of each app, and renew the DTs for the running app. 
We should provider renew hooks similar to what HDFS DT has for RM, and set RM 
user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188595#comment-14188595
 ] 

Zhijie Shen edited comment on YARN-2765 at 10/29/14 5:22 PM:
-

bq. This should work if the leveldb database is on a network store like a filer.

Thanks for sharing. This is an interesting use case that I'm not aware of 
before.

bq. I briefly considered using rocksdb for this but decided against it for a 
couple of reasons:

It's not particularly related to this Jira, but I just want to think it out 
loudly. It seems that rocksdb claims to have better performance in terms of I/O 
than leveldb, while their APIs are very similar to each other. After we have 
the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. 
Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the 
timeline server may want a stronger one. Rocksdb may be a compromise before 
migrating to fully distributed storage solution based on HBase. And one other 
merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm 
wrong, but it seems that rocksdb can also help to scale out the storage problem 
as well as support RM HA deployment in a shared nothing environment (e.g. 
without a network storage).

I'm not saying we should go with rocksdb now instead of leveldb, as we know it 
has been used for other components already. I'm trying to propose if we can 
think of rocksdb, which looks stronger but still reasonably simple alternate.

There's a rocksdb jni which seems to have windows support: 
https://github.com/fusesource/rocksdbjni

It should be the same org whose leveldbjni is currently used by us.


was (Author: zjshen):
bq. This should work if the leveldb database is on a network store like a filer.

Thanks for sharing. This is an interesting use case that I'm not aware of 
before.

bq. I briefly considered using rocksdb for this but decided against it for a 
couple of reasons:

It's not particularly related to this Jira, but I just want to think it out 
loudly. It seems that rocksdb claims to have better performance in terms of I/O 
than leveldb, while their APIs are very similar to each other. After we have 
the leveldb impl, it shouldn't be that difficult to make a rocksdb impl. 
Probably leveldb is enough to serve as the state store for RM/NM/JHS, but the 
timeline server may want a stronger one. Rocksdb may be a compromise before 
migrating to fully distributed storage solution based on HBase. And one other 
merit I've heard about rocksdb is that it can ride on HDFS. Correct me if I'm 
wrong, but it seems that rocksdb can also help to scale out the storage problem 
as well as support RM HA deployment in a shared nothing environment (e.g. 
without a network storage).

I'm not saying we should go with rocksdb now instead of leveldb, as we know it 
has been used for other components already. I'm trying to propose if we can 
think of rocksdb, which looks stronger but still reasonably simple alternate.

 Add leveldb-based implementation for RMStateStore
 -

 Key: YARN-2765
 URL: https://issues.apache.org/jira/browse/YARN-2765
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2765.patch, YARN-2765v2.patch


 It would be nice to have a leveldb option to the resourcemanager recovery 
 store. Leveldb would provide some benefits over the existing filesystem store 
 such as better support for atomic operations, fewer I/O ops per state update, 
 and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2742) FairSchedulerConfiguration fails to parse if there is extra space between value and unit

2014-10-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188614#comment-14188614
 ] 

Karthik Kambatla commented on YARN-2742:


+1

 FairSchedulerConfiguration fails to parse if there is extra space between 
 value and unit
 

 Key: YARN-2742
 URL: https://issues.apache.org/jira/browse/YARN-2742
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2742-1.patch, YARN-2742-2.patch


 FairSchedulerConfiguration is very strict about the number of space 
 characters between the value and the unit: 0 or 1 space.
 For example, for values like the following:
 {noformat}
 maxResources4096  mb, 2 vcoresmaxResources
 {noformat}
 (note 2 spaces)
 This above line fails to parse:
 {noformat}
 2014-10-24 22:56:40,802 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
  Failed to reload fair scheduler config file - will use existing allocations.
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException:
  Missing resource: mb
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit

2014-10-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2742:
---
Summary: FairSchedulerConfiguration should allow extra spaces between value 
and unit  (was: FairSchedulerConfiguration fails to parse if there is extra 
space between value and unit)

 FairSchedulerConfiguration should allow extra spaces between value and unit
 ---

 Key: YARN-2742
 URL: https://issues.apache.org/jira/browse/YARN-2742
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Wei Yan
Priority: Minor
 Attachments: YARN-2742-1.patch, YARN-2742-2.patch


 FairSchedulerConfiguration is very strict about the number of space 
 characters between the value and the unit: 0 or 1 space.
 For example, for values like the following:
 {noformat}
 maxResources4096  mb, 2 vcoresmaxResources
 {noformat}
 (note 2 spaces)
 This above line fails to parse:
 {noformat}
 2014-10-24 22:56:40,802 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
  Failed to reload fair scheduler config file - will use existing allocations.
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException:
  Missing resource: mb
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2742) FairSchedulerConfiguration should allow extra spaces between value and unit

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188639#comment-14188639
 ] 

Hudson commented on YARN-2742:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6382 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6382/])
YARN-2742. FairSchedulerConfiguration should allow extra spaces between value 
and unit. (Wei Yan via kasha) (kasha: rev 
782971ae7a0247bcf5920e10b434b7e0954dd868)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairSchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerConfiguration.java


 FairSchedulerConfiguration should allow extra spaces between value and unit
 ---

 Key: YARN-2742
 URL: https://issues.apache.org/jira/browse/YARN-2742
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.4.0
Reporter: Sangjin Lee
Assignee: Wei Yan
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2742-1.patch, YARN-2742-2.patch


 FairSchedulerConfiguration is very strict about the number of space 
 characters between the value and the unit: 0 or 1 space.
 For example, for values like the following:
 {noformat}
 maxResources4096  mb, 2 vcoresmaxResources
 {noformat}
 (note 2 spaces)
 This above line fails to parse:
 {noformat}
 2014-10-24 22:56:40,802 ERROR 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService:
  Failed to reload fair scheduler config file - will use existing allocations.
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationConfigurationException:
  Missing resource: mb
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.findResource(FairSchedulerConfiguration.java:247)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerConfiguration.parseResourceConfigValue(FairSchedulerConfiguration.java:231)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:347)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.loadQueue(AllocationFileLoaderService.java:381)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService.reloadAllocations(AllocationFileLoaderService.java:293)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService$1.run(AllocationFileLoaderService.java:117)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-29 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188728#comment-14188728
 ] 

Siqi Li commented on YARN-2755:
---

Hi [~jlowe] can you take a look at this?

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
 YARN-2755.v3.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2690) Make ReservationSystem and its dependent classes independent of Scheduler type

2014-10-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188739#comment-14188739
 ] 

Karthik Kambatla commented on YARN-2690:


Looks mostly good. Can we look into the javadoc warnings? 

Few minor comments:
# Rename ReservationSchedulerConfiguration to ReservationConfiguration? Not 
sure the Scheduler in there is adding much information. 
# Make ReservationConfiguration an abstract class that extends Configuration 
instead of an interface, so it can implement some of the getters at least those 
for which it carries defaults.
# Nit: The time defaults should be product of numbers instead of the result. 
e.g. {{24 * 60 * 60 * 1000}} instead of 8640L. 



 Make ReservationSystem and its dependent classes independent of Scheduler 
 type  
 

 Key: YARN-2690
 URL: https://issues.apache.org/jira/browse/YARN-2690
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2690.001.patch, YARN-2690.002.patch, 
 YARN-2690.002.patch, YARN-2690.003.patch


 A lot of common reservation classes depend on CapacityScheduler and 
 specifically its configuration. This jira is to make them ready for other 
 Schedulers by abstracting out the configuration.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2765) Add leveldb-based implementation for RMStateStore

2014-10-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188755#comment-14188755
 ] 

Jason Lowe commented on YARN-2765:
--

I agree that the timeline server seems like a worthy candidate for rocksdb.  
IIUC rocksdb's main use-case over leveldb is better performance when the 
database is larger than the node's RAM, which is likely in the case of the 
timeline server.

bq. And one other merit I've heard about rocksdb is that it can ride on HDFS.

This is news to me.  I knew rocksdb could be used as a cache of data that came 
from HDFS or could be backed-up to HDFS, but I didn't think it could read/write 
directly to it as part of normal operations.

bq. There's a rocksdb jni which seems to have windows support: 
https://github.com/fusesource/rocksdbjni

Awesome, thanks for finding that.  I was looking at the standard org.rocksdb 
package.  Only concern with the fusesource option would be if it starts to 
diverge significantly from the standard one.  The API is already slightly 
different between the two, and the fusesource one hasn't been touched in a year 
while the org.rocksdb package was updated just last week.

Probably best to continue this conversation in a separate JIRA proposing we 
consider rocksdb for the timeline server.  If it works well there it should be 
very straightforward to provide store backends for the RM, NM, and JHS if it 
makes sense for them as well.

 Add leveldb-based implementation for RMStateStore
 -

 Key: YARN-2765
 URL: https://issues.apache.org/jira/browse/YARN-2765
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: YARN-2765.patch, YARN-2765v2.patch


 It would be nice to have a leveldb option to the resourcemanager recovery 
 store. Leveldb would provide some benefits over the existing filesystem store 
 such as better support for atomic operations, fewer I/O ops per state update, 
 and far fewer total files on the filesystem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-2771:
-

 Summary: DistributedShell's DSConstants are badly named
 Key: YARN-2771
 URL: https://issues.apache.org/jira/browse/YARN-2771
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen


I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
DISTRIBUTEDSHELLTIMELINEDOMAIN).

DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to be 
DISTRIBUTED_SHELL_TIMELINE_DOMAIN?

For the old envs, we can just add new envs that point to the old-one and 
deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2738) Add FairReservationSystem for FairScheduler

2014-10-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188787#comment-14188787
 ] 

Karthik Kambatla commented on YARN-2738:


Do we want to make it configurable per-queue from the beginning? How about just 
starting with global settings for all queues, and adding per-queue configs 
depending on usecases and user feedback? 

Comments on the patch itself:
# FairReservationSystem: The TODO is not clear to me. IAC, we should avoid 
orphan TODOs - can we file a follow-up JIRA and add a reference at the TODO.
# Spurious import changes in a couple of files.

 Add FairReservationSystem for FairScheduler
 ---

 Key: YARN-2738
 URL: https://issues.apache.org/jira/browse/YARN-2738
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2738.001.patch


 Need to create a FairReservationSystem that will implement ReservationSystem 
 for FairScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188796#comment-14188796
 ] 

Zhijie Shen commented on YARN-2769:
---

+1. The fix makes sense, and we have the test to cover the code path on 
windows. Will commit the patch.

 Timeline server domain not set correctly when using shell_command on Windows
 

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-2769.0.patch


 The bug is caught by one of the unit tests which fails.
 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2769) Timeline server domain not set correctly when using shell_command on Windows

2014-10-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188830#comment-14188830
 ] 

Hudson commented on YARN-2769:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6385 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6385/])
YARN-2769. Fixed the problem that timeline domain is not set in distributed 
shell AM when using shell_command on Windows. Contributed by Varun Vasudev. 
(zjshen: rev a8c120222047280234c3411ce1c1c9b17f08c851)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
* hadoop-yarn-project/CHANGES.txt


 Timeline server domain not set correctly when using shell_command on Windows
 

 Key: YARN-2769
 URL: https://issues.apache.org/jira/browse/YARN-2769
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.6.0

 Attachments: apache-yarn-2769.0.patch


 The bug is caught by one of the unit tests which fails.
 {noformat}
 Running 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 37.661 sec 
  FAILURE! - in org.apache.hadoop.yarn.applications.distribut
 testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)
   Time elapsed: 37.366 sec   FAILURE!
 org.junit.ComparisonFailure: expected:[TEST_DOMAIN] but was:[DEFAULT]
 at org.junit.Assert.assertEquals(Assert.java:115)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:290)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:179)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188891#comment-14188891
 ] 

Hadoop QA commented on YARN-2698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677927/YARN-2698-20141029-1.patch
  against trunk revision ec63a3f.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapred.TestMRTimelineEventHandling

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5628//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5628//console

This message is automatically generated.

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch


 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2698:
-
Attachment: YARN-2698-20141029-2.patch

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
 YARN-2698-20141029-2.patch


 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)

2014-10-29 Thread Naganarasimha G R (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2495:

Attachment: YARN-2495.20141030-1.patch

Hi [~wangda],
I am uploading a patch with all the review comments fixed and with test cases, 
but i need to rebase it based on the latest code in trunk which i will do it 
tomorrow morning . With this patch you can review and if fine will submit the 
patch after re basing tomorrow


 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188925#comment-14188925
 ] 

Wangda Tan commented on YARN-2698:
--

Hi [~vinodkv],
bq. YarnClient usually has simpler APIs (like returning a map) instead of 
directly exposing the response objects, let’s do that.
Addressed
bq. bin/yarn needs to be updated to use the new CLI
Addressed
bq. Overall, I didn’t realize we already have a node CLI already: Let’s just 
move the node to labels mappings to that CLI. We could keep the all-nodes 
mapping though.
The node CLI is major get labels from NodeReport, they're all running NMs, I 
suggest to keep node to labels mapping in node-labels CLI (as its name), and in 
the future we can add a labels field in NodeReport and nodeCLI
bq. “will return all labels in the cluster” - “will return all accessible 
labels in the cluster”
I changed it to be .. return all node labels to make it consistent with java 
API names, please let me know if you disagree
bq. CLI for node-labels -list” should drop the prefix “Node-labels=“
Addressed
bq. CLI for “node-labels -list -nodeId all”: Say Node instead of Host? And then 
simply make it “Node:nm:5432 - label1, label2”
Addressed
bq. Move the node-cli tests into their own TestNodeLabelsCLI
Addressed
bq. Validate the help message for the new CLI.
Addressed 



 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
 YARN-2698-20141029-2.patch


 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2771:
--
Component/s: applications/distributed-shell

 DistributedShell's DSConstants are badly named
 --

 Key: YARN-2771
 URL: https://issues.apache.org/jira/browse/YARN-2771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
 DISTRIBUTEDSHELLTIMELINEDOMAIN).
 DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
 be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
 For the old envs, we can just add new envs that point to the old-one and 
 deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created YARN-2772:
-

 Summary: DistributedShell's timeline related options are not clear
 Key: YARN-2772
 URL: https://issues.apache.org/jira/browse/YARN-2772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen


The new options domain and create options - they are not descriptive at 
all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14188969#comment-14188969
 ] 

Vinod Kumar Vavilapalli commented on YARN-2772:
---

 I propose the following:
 - Rename the domain and create options to be timeline_domain_id and 
should_create_timeline_domain respectively.
 - Modify option description of view_acls and modify_acls to say that they are 
only needed if should_create_timeline_domain is true
 - Modify description of {{timeline_domain_id}} to say that it is optional and 
the it will use the DEFAULT timeline-domain by default
 - If {{should_create_timeline_domain}} is off, we should validate on the 
client to see if the domain really exists or not and fail the submission if not 
with a message saying The passed timeline-domain doesn't exist. Either pass an 
existing timeline-domain_id or set  should_create_timeline_domain to true.
 - If {{should_create_timeline_domain}} is on, and the user passes an existing 
timeline-domain-id, we should fail the submission and say The passed 
timeline-domain already exists. Either pass an new timeline-domain_id or set  
should_create_timeline_domain to false

 DistributedShell's timeline related options are not clear
 -

 Key: YARN-2772
 URL: https://issues.apache.org/jira/browse/YARN-2772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 The new options domain and create options - they are not descriptive at 
 all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2766:

Attachment: YARN-2766.patch

That makes sense.  

I wasn't able to trace the code back to ApplicationHistoryManager, but I did 
find where the lists are created, so I put the sorting calls there.  

 [JDK 8] TestApplicationHistoryClientService fails
 -

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-29 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189015#comment-14189015
 ] 

Jason Lowe commented on YARN-2755:
--

Thanks for the patch, Siqi.

userDirStatus can be null if userDirPath is not a directory, so we should avoid 
the potential NPE and check for {{userDirStatus != null  
userDirStatus.hasNext()}}


 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
 YARN-2755.v3.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-29 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated YARN-2755:
--
Attachment: YARN-2755.v4.patch

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
 YARN-2755.v3.patch, YARN-2755.v4.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-29 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189026#comment-14189026
 ] 

Siqi Li commented on YARN-2755:
---

Thanks for you feedback [~jlowe]. I have updated the patch with proper fix

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
 YARN-2755.v3.patch, YARN-2755.v4.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2773) ReservationSystem's use of Queue names vs paths is inconsistent for CapacityReservationSystem and FairReservationSystem

2014-10-29 Thread Anubhav Dhoot (JIRA)
Anubhav Dhoot created YARN-2773:
---

 Summary: ReservationSystem's use of Queue names vs paths is 
inconsistent for CapacityReservationSystem and FairReservationSystem  
 Key: YARN-2773
 URL: https://issues.apache.org/jira/browse/YARN-2773
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Reporter: Anubhav Dhoot
Priority: Minor


Reservation system requires use the ReservationDefinition to use a queue name 
to choose which reservation queue is being used. CapacityScheduler does not 
allow duplicate leaf queue names. Because of this we can refer to a unique leaf 
queue by simply using its name and not full path (which includes parentName + 
.). FairScheduler allows duplicate leaf queue names because of which one 
needs to refer to the full queue name to identify a queue uniquely. This is 
inconsistent for the implementation of the AbstractReservationSystem where one 
implementation of getQueuePath will do conversion (CapacityReservationSystem) 
while the FairReservationSystem will return the same value back 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189088#comment-14189088
 ] 

Hadoop QA commented on YARN-2766:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678001/YARN-2766.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 6 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5629//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5629//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5629//console

This message is automatically generated.

 [JDK 8] TestApplicationHistoryClientService fails
 -

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2755) NM fails to clean up usercache_DEL_timestamp dirs after YARN-661

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189101#comment-14189101
 ] 

Hadoop QA commented on YARN-2755:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678008/YARN-2755.v4.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5630//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5630//console

This message is automatically generated.

 NM fails to clean up usercache_DEL_timestamp dirs after YARN-661
 --

 Key: YARN-2755
 URL: https://issues.apache.org/jira/browse/YARN-2755
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-2755.v1.patch, YARN-2755.v2.patch, 
 YARN-2755.v3.patch, YARN-2755.v4.patch


 When NM restarts frequently due to some reason, a large number of directories 
 like these left in /data/disk$num/yarn/local/:
 /data/disk1/yarn/local/usercache_DEL_1414372756105
 /data/disk1/yarn/local/usercache_DEL_1413557901696
 /data/disk1/yarn/local/usercache_DEL_1413657004894
 /data/disk1/yarn/local/usercache_DEL_1413675321860
 /data/disk1/yarn/local/usercache_DEL_1414093167936
 /data/disk1/yarn/local/usercache_DEL_1413565841271
 These directories are empty, but take up 100M+ due to the number of them. 
 There were 38714 on the machine I looked at per data disk.
 It appears to be a regression introduced by YARN-661



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2556) Tool to measure the performance of the timeline server

2014-10-29 Thread chang li (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2556:
---
Attachment: yarn2556.patch

Cleaned up my patch, welcome to review. I have used this application to test 
the timeline server throughput on local mode by launching 4 mappers and each 
will put an entity larger than 100 kbs and iterate for 1000 times. Here is my 
measure result, on my local machine, the timeline server can provide about 
10Mbs io rate for write. There is some deviation from the write throughput for 
leveldb. People are welcome to try this tool and comment about it.  

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: chang li
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2701) Potential race condition in startLocalizer when using LinuxContainerExecutor

2014-10-29 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189117#comment-14189117
 ] 

Allen Wittenauer commented on YARN-2701:


OK, this compiled without incident, so I'm +1 now.  Thanks!

 Potential race condition in startLocalizer when using LinuxContainerExecutor  
 --

 Key: YARN-2701
 URL: https://issues.apache.org/jira/browse/YARN-2701
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.6.0

 Attachments: YARN-2701.1.patch, YARN-2701.2.patch, YARN-2701.3.patch, 
 YARN-2701.4.patch, YARN-2701.5.patch, YARN-2701.6.patch, 
 YARN-2701.addendum.1.patch, YARN-2701.addendum.2.patch, 
 YARN-2701.addendum.3.patch, YARN-2701.addendum.4.patch


 When using LinuxContainerExecutor do startLocalizer, we are using native code 
 container-executor.c. 
 {code}
  if (stat(npath, sb) != 0) {
if (mkdir(npath, perm) != 0) {
 {code}
 We are using check and create method to create the appDir under /usercache. 
 But if there are two containers trying to do this at the same time, race 
 condition may happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2556) Tool to measure the performance of the timeline server

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189143#comment-14189143
 ] 

Hadoop QA commented on YARN-2556:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678020/yarn2556.patch
  against trunk revision d33e07d.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5631//console

This message is automatically generated.

 Tool to measure the performance of the timeline server
 --

 Key: YARN-2556
 URL: https://issues.apache.org/jira/browse/YARN-2556
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: chang li
 Attachments: YARN-2556-WIP.patch, YARN-2556-WIP.patch, 
 yarn2556.patch, yarn2556_wip.patch


 We need to be able to understand the capacity model for the timeline server 
 to give users the tools they need to deploy a timeline server with the 
 correct capacity.
 I propose we create a mapreduce job that can measure timeline server write 
 and read performance. Transactions per second, I/O for both read and write 
 would be a good start.
 This could be done as an example or test job that could be tied into gridmix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated YARN-2766:

Attachment: YARN-2766.patch

New patch fixes findbugs warnings

 [JDK 8] TestApplicationHistoryClientService fails
 -

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2770:
--
Attachment: YARN-2770.1.patch

Created a patch:

* Add two timeline client APIs - renew/cancel delegation token
* Make TimelineDelegationTokenIdentifier.Renewer extend TokenRenewer and 
implement renew and cancel logic by using timeline client APIs
* Change YarnClientImpl to set the renewer of the timeline DT to the user of RM 
daemon.
* Add the test cases to validate renew/cancel APIs
* Have done end-to-end test to verify that the automatic DT renew works in a 
secure cluster.

 Timeline delegation tokens need to be automatically renewed by the RM
 -

 Key: YARN-2770
 URL: https://issues.apache.org/jira/browse/YARN-2770
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Attachments: YARN-2770.1.patch


 YarnClient will automatically grab a timeline DT for the application and pass 
 it to the app AM. Now the timeline DT renew is still dummy. If an app is 
 running for more than 24h (default DT expiry time), the app AM is no longer 
 able to use the expired DT to communicate with the timeline server. Since RM 
 will cache the credentials of each app, and renew the DTs for the running 
 app. We should provider renew hooks similar to what HDFS DT has for RM, and 
 set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189194#comment-14189194
 ] 

Zhijie Shen commented on YARN-2766:
---

I think we need to change ApplicationContext - ApplicationHistoryManager - 
ApplicationHistoryManagerOnTimelineStore. Modifying the protobuf message will 
not help the web services.

 [JDK 8] TestApplicationHistoryClientService fails
 -

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2766:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: YARN-1530)

 [JDK 8] TestApplicationHistoryClientService fails
 -

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189195#comment-14189195
 ] 

Karthik Kambatla commented on YARN-2579:


Thanks, [~rohithsharma]. Looking at the tests and your explanation, I think I 
see what you are saying. 

However, looking into the code, I am not convinced it is draining out that is 
causing this issue. {{rmDispatcher}} is an {{AsyncDispatcher}}, with 
{{drainEventsOnStop}} always false. So, {{rmDispatcher.stop()}} shouldn't lead 
to any draining of events. I noticed a couple of other issues in the 
AsyncDispatcher code:
# {{eventHandlerThread.join}} in serviceStop should take a timeout as well
# {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a try-catch 
block 

With the current patch, I wonder if there are any unexpected side-effects. 

 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
 Attachments: YARN-2579.patch, YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-29 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-2579:
---
Priority: Blocker  (was: Major)
Target Version/s: 2.6.0

 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-2579.patch, YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2766) [JDK 8] TestApplicationHistoryClientService fails

2014-10-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2766:
--
Issue Type: Sub-task  (was: Bug)
Parent: YARN-321

 [JDK 8] TestApplicationHistoryClientService fails
 -

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-10-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2766:
--
Summary:  ApplicationHistoryManager is expected to return a sorted list of 
apps/attempts/containers  (was: [JDK 8] TestApplicationHistoryClientService 
fails)

  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2766) ApplicationHistoryManager is expected to return a sorted list of apps/attempts/containers

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189247#comment-14189247
 ] 

Hadoop QA commented on YARN-2766:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678034/YARN-2766.patch
  against trunk revision 3ae84e1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5632//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5632//console

This message is automatically generated.

  ApplicationHistoryManager is expected to return a sorted list of 
 apps/attempts/containers
 --

 Key: YARN-2766
 URL: https://issues.apache.org/jira/browse/YARN-2766
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Robert Kanter
Assignee: Robert Kanter
 Attachments: YARN-2766.patch, YARN-2766.patch, YARN-2766.patch


 {{TestApplicationHistoryClientService.testContainers}} and 
 {{TestApplicationHistoryClientService.testApplicationAttempts}} both fail 
 because the test assertions are assuming a returned Collection is in a 
 certain order.  The collection comes from a HashMap, so the order is not 
 guaranteed, plus, according to [this 
 page|http://docs.oracle.com/javase/8/docs/technotes/guides/collections/changes8.html],
  there are situations where the iteration order of a HashMap will be 
 different between Java 7 and 8.
 We should fix the test code to not assume a specific ordering.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2771) DistributedShell's DSConstants are badly named

2014-10-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2771:
--
Attachment: YARN-2771.1.patch

While I was aware of the bad naming, I decided to follow the pattern of the 
existing constants in DSConstants to be consistent. Anyway, I've uploaded a 
patch to fix all these constants.

DS is not a serious computation framework, the env var name change is 
transparent to the CLI user, hence it should not breaking anything.

 DistributedShell's DSConstants are badly named
 --

 Key: YARN-2771
 URL: https://issues.apache.org/jira/browse/YARN-2771
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-2771.1.patch


 I'd rather have underscores (DISTRIBUTED_SHELL_TIMELINE_DOMAIN instead of 
 DISTRIBUTEDSHELLTIMELINEDOMAIN).
 DISTRIBUTEDSHELLTIMELINEDOMAIN is added in this release, can we rename it to 
 be DISTRIBUTED_SHELL_TIMELINE_DOMAIN?
 For the old envs, we can just add new envs that point to the old-one and 
 deprecate the old ones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2186) Node Manager uploader service for cache manager

2014-10-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189307#comment-14189307
 ] 

Karthik Kambatla commented on YARN-2186:


Thanks Sangjin. Looks mostly good, but for some minor comments: 
# How about renaming NMUploaderSerivceSCMProtocol to SharedCacheUploader (after 
ResourceTracker) or SharedCacheUploaderProtocol? Accordingly, rename all other 
related classes and proto files? 
# Instead of {{yarn.sharedcache.nodemanager.}}, we should probably call it 
{{yarn.sharedcache.uploader}} to avoid confusion? 
# As per our offline discussions, it would be nice to add a way for the NM to 
ask the SCM whether it should upload a resource to the shared-cache or not. For 
now, this could be always yes. In the future, we can add a pluggable policy 
that the SCM would consult to answer the NM.
# NMCacheUploaderSCMProtocolPBClientImpl#close should set {{this.proxy}} to 
null after calling stopProxy.
# NMCacheUploaderSCMProtocolService:
## TODOs should have an associated follow-up JIRA and reference in the code so 
we don't forget
## serviceStop should set {{this.server}} to null after calling 
{{this.server.stop()}}

 Node Manager uploader service for cache manager
 ---

 Key: YARN-2186
 URL: https://issues.apache.org/jira/browse/YARN-2186
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2186-trunk-v1.patch, YARN-2186-trunk-v2.patch, 
 YARN-2186-trunk-v3.patch, YARN-2186-trunk-v4.patch


 Implement the node manager uploader service for the cache manager. This 
 service is responsible for communicating with the node manager when it 
 uploads resources to the shared cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-10-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189344#comment-14189344
 ] 

Karthik Kambatla commented on YARN-2588:


Thanks Jian for pointing me to this. Patch fixes an important issue, but I 
would like for us to call transitionToStandby in the catch-block instead of 
explicitly calling the contents of transitionToStandby. I ll fix this up in 
YARN-2010. 

 Standby RM does not transitionToActive if previous transitionToActive is 
 failed with ZK exception.
 --

 Key: YARN-2588
 URL: https://issues.apache.org/jira/browse/YARN-2588
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.6.0, 2.5.1
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.6.0

 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch


 Consider scenario where, StandBy RM is failed to transition to Active because 
 of ZK exception(connectionLoss or SessionExpired). Then any further 
 transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189384#comment-14189384
 ] 

Zhijie Shen commented on YARN-2772:
---

[~vinodkv], thanks for your proposal.

1. I prefer create_timeline_domain over should_create_timeline_domain, as 
it is an option without arg. So there will not be true/false for it.

2. I'd like to enforce the validation logic (see the existing code comment). 
However, as we're lacking timeline client query APIs. It will involve more 
steps to send http requests and parse JSON response. I prefer to do it after 
YARN-2423.
{code}
try {
  //TODO: we need to check and combine the existing timeline domain ACLs,
  //but let's do it once we have client java library to query domains.
  TimelineDomain domain = new TimelineDomain();
{code}

Otherwise, I've addressed the other comments and made a patch.

 DistributedShell's timeline related options are not clear
 -

 Key: YARN-2772
 URL: https://issues.apache.org/jira/browse/YARN-2772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen

 The new options domain and create options - they are not descriptive at 
 all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-2772:
--
Attachment: YARN-2772.1.patch

 DistributedShell's timeline related options are not clear
 -

 Key: YARN-2772
 URL: https://issues.apache.org/jira/browse/YARN-2772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-2772.1.patch


 The new options domain and create options - they are not descriptive at 
 all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2774) shared cache uploader service should authorize notify calls properly

2014-10-29 Thread Sangjin Lee (JIRA)
Sangjin Lee created YARN-2774:
-

 Summary: shared cache uploader service should authorize notify 
calls properly
 Key: YARN-2774
 URL: https://issues.apache.org/jira/browse/YARN-2774
 Project: Hadoop YARN
  Issue Type: Task
Reporter: Sangjin Lee


The shared cache manager (SCM) uploader service (done in YARN-2186) currently 
does not authorize calls to notify the SCM on newly uploaded resource. Proper 
security/authorization needs to be done in this RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2774) shared cache uploader service should authorize notify calls properly

2014-10-29 Thread Sangjin Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sangjin Lee updated YARN-2774:
--
Issue Type: Sub-task  (was: Task)
Parent: YARN-1492

 shared cache uploader service should authorize notify calls properly
 

 Key: YARN-2774
 URL: https://issues.apache.org/jira/browse/YARN-2774
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Sangjin Lee

 The shared cache manager (SCM) uploader service (done in YARN-2186) currently 
 does not authorize calls to notify the SCM on newly uploaded resource. Proper 
 security/authorization needs to be done in this RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2604) Scheduler should consider max-allocation-* in conjunction with the largest node

2014-10-29 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter reassigned YARN-2604:
---

Assignee: Robert Kanter  (was: Karthik Kambatla)

 Scheduler should consider max-allocation-* in conjunction with the largest 
 node
 ---

 Key: YARN-2604
 URL: https://issues.apache.org/jira/browse/YARN-2604
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scheduler
Affects Versions: 2.5.1
Reporter: Karthik Kambatla
Assignee: Robert Kanter

 If the scheduler max-allocation-* values are larger than the resources 
 available on the largest node in the cluster, an application requesting 
 resources between the two values will be accepted by the scheduler but the 
 requests will never be satisfied. The app essentially hangs forever. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2698) Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of RMAdminCLI

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189488#comment-14189488
 ] 

Hadoop QA commented on YARN-2698:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12677981/YARN-2698-20141029-2.patch
  against trunk revision 6f5f604.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.mapreduce.lib.output.TestJobOutputCommitter
  
org.apache.hadoop.mapreduce.v2.TestMRAMWithNonNormalizedCapabilities
  org.apache.hadoop.mapreduce.TestMapReduceLazyOutput
  org.apache.hadoop.mapreduce.v2.TestNonExistentJob
  org.apache.hadoop.mapreduce.v2.TestMiniMRProxyUser
  org.apache.hadoop.mapreduce.v2.TestMRAppWithCombiner
  org.apache.hadoop.mapreduce.v2.TestUberAM
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler
  org.apache.hadoop.mapreduce.v2.TestMRJobs
  org.apache.hadoop.mapreduce.v2.TestRMNMInfo
  org.apache.hadoop.mapreduce.v2.TestSpeculativeExecution
  org.apache.hadoop.mapreduce.v2.TestMROldApiJobs
  org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService
  org.apache.hadoop.mapreduce.TestLargeSort
  org.apache.hadoop.mapred.TestClusterMRNotification

  The test build failed in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5634//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5634//artifact/patchprocess/patchReleaseAuditProblems.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5634//console

This message is automatically generated.

 Move getClusterNodeLabels and getNodeToLabels to YARN CLI instead of 
 RMAdminCLI
 ---

 Key: YARN-2698
 URL: https://issues.apache.org/jira/browse/YARN-2698
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: api, client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2698-20141028-1.patch, YARN-2698-20141028-2.patch, 
 YARN-2698-20141028-3.patch, YARN-2698-20141029-1.patch, 
 YARN-2698-20141029-2.patch


 YARN RMAdminCLI and AdminService should have write API only, for other read 
 APIs, they should be located at YARNCLI and RMClientService.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2579) Both RM's state is Active , but 1 RM is not really active.

2014-10-29 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189583#comment-14189583
 ] 

Rohith commented on YARN-2579:
--

Thanks Karthink!! 
bq. (Service)Dispatcher.stop() wait for draining out RMFatalEventDispatcher 
event
I was meant to say that drained event i.e RMFatalEvent is been waiting to be 
finished at {{rmDispatcher.stop()}}  in {{eventHandlerThread.join}}.

bq. {{dispatch(event)}} in AsyncDispatcher#createThread doesn't have a 
try-catch block 
{{dispatch(event)}}  method catch throwable and exit the JVM. But I see if 
handler's are not registered , then we must have try-catch block. do you meant 
for this scenario?

bq. {{eventHandlerThread.join}} in serviceStop should take a timeout as well
+1 for this approach too, this also fixes hang problem. The attached patch too 
does not bring Rm to hang in a kind of deadlock mode.

bq. With the current patch, I wonder if there are any unexpected side-effects
I have verified many switching scenarios as I mentioned in previous comment and 
more deployed in real cluster. It is working fine with work preserving restart 
too.

 Both RM's state is Active , but 1 RM is not really active.
 --

 Key: YARN-2579
 URL: https://issues.apache.org/jira/browse/YARN-2579
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.1
Reporter: Rohith
Assignee: Rohith
Priority: Blocker
 Attachments: YARN-2579.patch, YARN-2579.patch


 I encountered a situaltion where both RM's web page was able to access and 
 its state displayed as Active. But One of the RM's ActiveServices were 
 stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2588) Standby RM does not transitionToActive if previous transitionToActive is failed with ZK exception.

2014-10-29 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189600#comment-14189600
 ] 

Rohith commented on YARN-2588:
--

bq. but I would like for us to call transitionToStandby in the catch-block 
instead of explicitly calling the contents of transitionToStandby
As I understand the comment, Is expecting change is like below..? CMIIAW, If 
yes, transitionToStandby return in intial state check itself. And end up in 
without creating active services and resetting dispatcher!!!
{code}
try {
  startActiveServices();
  return null;
} catch (Exception e) {
  transitionToStandby(true);
  throw e;
}
{code}


 Standby RM does not transitionToActive if previous transitionToActive is 
 failed with ZK exception.
 --

 Key: YARN-2588
 URL: https://issues.apache.org/jira/browse/YARN-2588
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 3.0.0, 2.6.0, 2.5.1
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.6.0

 Attachments: YARN-2588.1.patch, YARN-2588.2.patch, YARN-2588.patch


 Consider scenario where, StandBy RM is failed to transition to Active because 
 of ZK exception(connectionLoss or SessionExpired). Then any further 
 transition to Active for same RM does not move RM to Active state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2772) DistributedShell's timeline related options are not clear

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189652#comment-14189652
 ] 

Hadoop QA commented on YARN-2772:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678074/YARN-2772.1.patch
  against trunk revision 0126cf1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5636//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5636//console

This message is automatically generated.

 DistributedShell's timeline related options are not clear
 -

 Key: YARN-2772
 URL: https://issues.apache.org/jira/browse/YARN-2772
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Reporter: Vinod Kumar Vavilapalli
Assignee: Zhijie Shen
 Attachments: YARN-2772.1.patch


 The new options domain and create options - they are not descriptive at 
 all. It is also not clear when view_acls and modify_acls need to be set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager

2014-10-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2753:

Attachment: (was: YARN-2753.005.patch)

 Fix potential issues and code clean up for *NodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch, YARN-2753.001.patch, 
 YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, 
 YARN-2753.005.patch


 Issues include:
 * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value 
 in labelCollections if the key already exists otherwise the Label.resource 
 will be changed(reset).
 * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 ** because when a Node is created, Node.labels can be null.
 ** In this case, nm.labels; may be null. So we need check originalLabels not 
 null before use it(originalLabels.containsAll).
 * addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java. because we should protect labelCollections in 
 RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2753) Fix potential issues and code clean up for *NodeLabelsManager

2014-10-29 Thread zhihai xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhihai xu updated YARN-2753:

Attachment: YARN-2753.005.patch

 Fix potential issues and code clean up for *NodeLabelsManager
 -

 Key: YARN-2753
 URL: https://issues.apache.org/jira/browse/YARN-2753
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: zhihai xu
Assignee: zhihai xu
 Attachments: YARN-2753.000.patch, YARN-2753.001.patch, 
 YARN-2753.002.patch, YARN-2753.003.patch, YARN-2753.004.patch, 
 YARN-2753.005.patch


 Issues include:
 * CommonNodeLabelsManager#addToCluserNodeLabels should not change the value 
 in labelCollections if the key already exists otherwise the Label.resource 
 will be changed(reset).
 * potential NPE(NullPointerException) in checkRemoveLabelsFromNode of 
 CommonNodeLabelsManager.
 ** because when a Node is created, Node.labels can be null.
 ** In this case, nm.labels; may be null. So we need check originalLabels not 
 null before use it(originalLabels.containsAll).
 * addToCluserNodeLabels should be protected by writeLock in 
 RMNodeLabelsManager.java. because we should protect labelCollections in 
 RMNodeLabelsManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2770) Timeline delegation tokens need to be automatically renewed by the RM

2014-10-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14189657#comment-14189657
 ] 

Hadoop QA commented on YARN-2770:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678042/YARN-2770.1.patch
  against trunk revision 0126cf1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5635//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5635//console

This message is automatically generated.

 Timeline delegation tokens need to be automatically renewed by the RM
 -

 Key: YARN-2770
 URL: https://issues.apache.org/jira/browse/YARN-2770
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.5.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical
 Attachments: YARN-2770.1.patch


 YarnClient will automatically grab a timeline DT for the application and pass 
 it to the app AM. Now the timeline DT renew is still dummy. If an app is 
 running for more than 24h (default DT expiry time), the app AM is no longer 
 able to use the expired DT to communicate with the timeline server. Since RM 
 will cache the credentials of each app, and renew the DTs for the running 
 app. We should provider renew hooks similar to what HDFS DT has for RM, and 
 set RM user as the renewer when grabbing the timeline DT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >