date:20141203


[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232753#comment-14232753
 ] 

Rohith commented on YARN-2892:
--

+1(non-binding) lgtm

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-12-03 Thread Yang Hao (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated YARN-2081:
---
Target Version/s:   (was: 2.4.1)

 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 3.0.0, 2.4.1
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
  Labels: 2.4.1
 Fix For: 2.4.1

 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2081) TestDistributedShell fails after YARN-1962

2014-12-03 Thread Yang Hao (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Hao updated YARN-2081:
---
Labels: 2.4.1  (was: )

 TestDistributedShell fails after YARN-1962
 --

 Key: YARN-2081
 URL: https://issues.apache.org/jira/browse/YARN-2081
 Project: Hadoop YARN
  Issue Type: Bug
  Components: applications/distributed-shell
Affects Versions: 3.0.0, 2.4.1
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
  Labels: 2.4.1
 Fix For: 2.4.1

 Attachments: YARN-2081.patch


 java.lang.AssertionError: expected:1 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at org.junit.Assert.assertEquals(Assert.java:542)
 at 
 org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:198)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.


[ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232900#comment-14232900
 ] 

Hudson commented on YARN-2894:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #763 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/763/])
YARN-2894. Fixed a bug regarding application view acl when RM fails over. 
Contributed by Rohith Sharmaks (jianhe: rev 
392c3aaea8e8f156b76e418157fa347256283c56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java


 When ACL's are enabled, if RM switches then application can not be viewed 
 from web.
 ---

 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: YARN-2894.1.patch, YARN-2894.patch


 Binding aclManager to RMWebApp would cause problem if RM is switched. There 
 could be some validation check may fail.
 I think , we should not bind aclManager for RMWebApp, instead we should get 
 from RM instance.
 In RMWebApp,
 {code}
 if (rm != null) {
   bind(ResourceManager.class).toInstance(rm);
   bind(RMContext.class).toInstance(rm.getRMContext());
   bind(ApplicationACLsManager.class).toInstance(
   rm.getApplicationACLsManager());
   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
 }
 {code}
 and in AppBlock#render below check may fail(Need to test and confirm)

[jira] [Commented] (YARN-1156) Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values


[ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232907#comment-14232907
 ] 

Junping Du commented on YARN-1156:
--

+1. Patch looks good to me. Committing this in.

 Change NodeManager AllocatedGB and AvailableGB metrics to show decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1156:
-
Summary: Enhance NodeManager AllocatedGB and AvailableGB metrics for 
aggregation of decimal values  (was: Change NodeManager AllocatedGB and 
AvailableGB metrics to show decimal values)

 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
Priority: Minor
  Labels: metrics, newbie
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1156:
-
Priority: Major  (was: Minor)

 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
  Labels: metrics, newbie
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232919#comment-14232919
 ] 

Hudson commented on YARN-2136:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/])
YARN-2136. Changed RMStateStore to ignore store opearations when fenced. 
Contributed by Varun Saxena (jianhe: rev 
52bcefca8bb13d3757009f1f08203e7dca3b1e16)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java


 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, 
 YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch


 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly


[ 
https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232916#comment-14232916
 ] 

Hudson commented on YARN-2472:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/])
YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via 
aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85)
* hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh
* hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh


 yarn-daemons.sh should just call yarn directly
 --

 Key: YARN-2472
 URL: https://issues.apache.org/jira/browse/YARN-2472
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 3.0.0

 Attachments: YARN-2472-1.patch


 There is little-to-no need for it to go through yarn-daemon.sh anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.


[ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232913#comment-14232913
 ] 

Hudson commented on YARN-2894:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/24/])
YARN-2894. Fixed a bug regarding application view acl when RM fails over. 
Contributed by Rohith Sharmaks (jianhe: rev 
392c3aaea8e8f156b76e418157fa347256283c56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java


 When ACL's are enabled, if RM switches then application can not be viewed 
 from web.
 ---

 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: YARN-2894.1.patch, YARN-2894.patch


 Binding aclManager to RMWebApp would cause problem if RM is switched. There 
 could be some validation check may fail.
 I think , we should not bind aclManager for RMWebApp, instead we should get 
 from RM instance.
 In RMWebApp,
 {code}
 if (rm != null) {
   bind(ResourceManager.class).toInstance(rm);
   bind(RMContext.class).toInstance(rm.getRMContext());
   bind(ApplicationACLsManager.class).toInstance(
   rm.getApplicationACLsManager());
   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
 }
 {code}
 and in AppBlock#render below check may fail(Need to test

[jira] [Updated] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


 [ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-1156:
-
Labels: metrics  (was: metrics newbie)

 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
  Labels: metrics
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


[ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14232939#comment-14232939
 ] 

Hudson commented on YARN-1156:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6639 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6639/])
YARN-1156. Enhance NodeManager AllocatedGB and AvailableGB metrics for 
aggregation of decimal values. (Contributed by Tsuyoshi OZAWA) (junping_du: rev 
e65b7c5ff6b0c013e510e750fe5cf59acfefea5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* hadoop-yarn-project/CHANGES.txt


 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
  Labels: metrics
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2910) FSLeafQueue can throw ConcurrentModificationException

2014-12-03 Thread Wilfred Spiegelenburg (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233006#comment-14233006
 ] 

Wilfred Spiegelenburg commented on YARN-2910:
-

I have the code change done with all the synchronisation around the for loops. 
All iterator access of the {{Collections.synchronizedList}} needs to be 
synchronised, based on the javadoc, which might impact the performance as much 
or worse than the copy on write.
The junit test is in almost done and I will update the patch when that is 
finished.

 FSLeafQueue can throw ConcurrentModificationException
 -

 Key: YARN-2910
 URL: https://issues.apache.org/jira/browse/YARN-2910
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.5.0
Reporter: Wilfred Spiegelenburg
Assignee: Wilfred Spiegelenburg
 Attachments: FSLeafQueue_concurrent_exception.txt, YARN-2910.patch


 The list that maintains the runnable and the non runnable apps are a standard 
 ArrayList but there is no guarantee that it will only be manipulated by one 
 thread in the system. This can lead to the following exception:
 {noformat}
 2014-11-12 02:29:01,169 ERROR [RMCommunicator Allocator] 
 org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: ERROR IN 
 CONTACTING RM.
 java.util.ConcurrentModificationException: 
 java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
 at java.util.ArrayList$Itr.next(ArrayList.java:831)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.getResourceUsage(FSLeafQueue.java:147)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSAppAttempt.getHeadroom(FSAppAttempt.java:180)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.allocate(FairScheduler.java:923)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:516)
 {noformat}
 Full stack trace in the attached file.
 We should guard against that by using a thread safe version from 
 java.util.concurrent.CopyOnWriteArrayList



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2917) RM get hanged if fail to store NodeLabels into store.

Rohith created YARN-2917:


 Summary: RM get hanged if fail to store NodeLabels into store.
 Key: YARN-2917
 URL: https://issues.apache.org/jira/browse/YARN-2917
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical


I encoutered scenario where RM hanged while shutting down and keep on logging 
{{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2917) RM get hanged if fail to store NodeLabels into store.


[ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233023#comment-14233023
 ] 

Rohith commented on YARN-2917:
--

Attaching thread dump when RM hanged
{code}
Thread-1 prio=10 tid=0x006e1000 nid=0x55a4 in Object.wait() 
[0x7f2ce9493000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xf26b0d48 (a java.lang.Object)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:141)
- locked 0xf26b0d48 (a java.lang.Object)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0xf26b0aa8 (a java.lang.Object)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:232)
at 
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:238)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0xf26b0968 (a java.lang.Object)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:599)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0xf2842458 (a java.lang.Object)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.stopActiveServices(ResourceManager.java:1002)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToStandby(ResourceManager.java:1057)
- locked 0xc0c96c98 (a 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1104)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0xc0cab280 (a java.lang.Object)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:65)
at 
org.apache.hadoop.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:183)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

AsyncDispatcher event handler daemon prio=10 tid=0x7f2cf0b81000 
nid=0x54a1 in Object.wait() [0x7f2cf7bfa000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xc01b83e8 (a 
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1281)
- locked 0xc01b83e8 (a 
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1355)
at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked 0xc04ae9c0 (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:109)
at java.lang.System.exit(System.java:962)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:185)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
{code}

 RM get hanged if fail to store NodeLabels into store.
 -

 Key: YARN-2917
 URL: https://issues.apache.org/jira/browse/YARN-2917
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical

 I encoutered scenario where RM hanged while shutting down and keep on logging 
 {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2917) RM get hanged if fail to store NodeLabels into store.


[ 
https://issues.apache.org/jira/browse/YARN-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233035#comment-14233035
 ] 

Rohith commented on YARN-2917:
--

The main problem is 
Thread-1 : CommonNodeLabelManager#handle() throw back exception to 
AsyncDispatcher. Intern, asyncDispatcher calls shutdown hook and waiting for 
shutdown hook to complete.
Thread-2 : ShutodownHook stops RM gracefully.But gracefull stop wait for 
drainng events from AsyncDispatcher.


 RM get hanged if fail to store NodeLabels into store.
 -

 Key: YARN-2917
 URL: https://issues.apache.org/jira/browse/YARN-2917
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Critical

 I encoutered scenario where RM hanged while shutting down and keep on logging 
 {{2014-12-03 19:32:44,283 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: 
 Waiting for AsyncDispatcher to drain.}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly


[ 
https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233044#comment-14233044
 ] 

Hudson commented on YARN-2472:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/])
YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via 
aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85)
* hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh
* hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh
* hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh
* hadoop-yarn-project/CHANGES.txt


 yarn-daemons.sh should just call yarn directly
 --

 Key: YARN-2472
 URL: https://issues.apache.org/jira/browse/YARN-2472
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 3.0.0

 Attachments: YARN-2472-1.patch


 There is little-to-no need for it to go through yarn-daemon.sh anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233047#comment-14233047
 ] 

Hudson commented on YARN-2136:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/])
YARN-2136. Changed RMStateStore to ignore store opearations when fenced. 
Contributed by Varun Saxena (jianhe: rev 
52bcefca8bb13d3757009f1f08203e7dca3b1e16)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java


 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, 
 YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch


 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.


[ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233041#comment-14233041
 ] 

Hudson commented on YARN-2894:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1955 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1955/])
YARN-2894. Fixed a bug regarding application view acl when RM fails over. 
Contributed by Rohith Sharmaks (jianhe: rev 
392c3aaea8e8f156b76e418157fa347256283c56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java


 When ACL's are enabled, if RM switches then application can not be viewed 
 from web.
 ---

 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: YARN-2894.1.patch, YARN-2894.patch


 Binding aclManager to RMWebApp would cause problem if RM is switched. There 
 could be some validation check may fail.
 I think , we should not bind aclManager for RMWebApp, instead we should get 
 from RM instance.
 In RMWebApp,
 {code}
 if (rm != null) {
   bind(ResourceManager.class).toInstance(rm);
   bind(RMContext.class).toInstance(rm.getRMContext());
   bind(ApplicationACLsManager.class).toInstance(
   rm.getApplicationACLsManager());
   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
 }
 {code}
 and in AppBlock#render below check may fail(Need to test and

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233063#comment-14233063
 ] 

Hudson commented on YARN-2136:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/])
YARN-2136. Changed RMStateStore to ignore store opearations when fenced. 
Contributed by Varun Saxena (jianhe: rev 
52bcefca8bb13d3757009f1f08203e7dca3b1e16)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java


 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, 
 YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch


 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.


[ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233057#comment-14233057
 ] 

Hudson commented on YARN-2894:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/])
YARN-2894. Fixed a bug regarding application view acl when RM fails over. 
Contributed by Rohith Sharmaks (jianhe: rev 
392c3aaea8e8f156b76e418157fa347256283c56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java


 When ACL's are enabled, if RM switches then application can not be viewed 
 from web.
 ---

 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: YARN-2894.1.patch, YARN-2894.patch


 Binding aclManager to RMWebApp would cause problem if RM is switched. There 
 could be some validation check may fail.
 I think , we should not bind aclManager for RMWebApp, instead we should get 
 from RM instance.
 In RMWebApp,
 {code}
 if (rm != null) {
   bind(ResourceManager.class).toInstance(rm);
   bind(RMContext.class).toInstance(rm.getRMContext());
   bind(ApplicationACLsManager.class).toInstance(
   rm.getApplicationACLsManager());
   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
 }
 {code}
 and in AppBlock#render below check may fail(Need to test

[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly


[ 
https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233060#comment-14233060
 ] 

Hudson commented on YARN-2472:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/24/])
YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via 
aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85)
* hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh
* hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh


 yarn-daemons.sh should just call yarn directly
 --

 Key: YARN-2472
 URL: https://issues.apache.org/jira/browse/YARN-2472
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 3.0.0

 Attachments: YARN-2472-1.patch


 There is little-to-no need for it to go through yarn-daemon.sh anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue.

Rohith created YARN-2918:


 Summary: RM starts up fails if accessible-node-labels are 
configured to queue.
 Key: YARN-2918
 URL: https://issues.apache.org/jira/browse/YARN-2918
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith


I configured accessible-node-labels to queue. But RM startup fails with below 
exception. I see current steps to configure NodeLabel is first need to add via 
rmadmin and later need to configure for queues. But it will be good if both 
cluster and queue node labels has consitency in configuring it. 
{noformat}
2014-12-03 20:11:50,126 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
ResourceManager
org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
NodeLabelManager doesn't include label = x, please check.
at 
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203)
Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
please check.
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables


 [ 
https://issues.apache.org/jira/browse/YARN-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated YARN-2918:
-
Summary: RM starts up fails if accessible-node-labels are configured to 
queue without cluster lables  (was: RM starts up fails if 
accessible-node-labels are configured to queue.)

 RM starts up fails if accessible-node-labels are configured to queue without 
 cluster lables
 ---

 Key: YARN-2918
 URL: https://issues.apache.org/jira/browse/YARN-2918
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith

 I configured accessible-node-labels to queue. But RM startup fails with below 
 exception. I see current steps to configure NodeLabel is first need to add 
 via rmadmin and later need to configure for queues. But it will be good if 
 both cluster and queue node labels has consitency in configuring it. 
 {noformat}
 2014-12-03 20:11:50,126 FATAL 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting 
 ResourceManager
 org.apache.hadoop.service.ServiceStateException: java.io.IOException: 
 NodeLabelManager doesn't include label = x, please check.
   at 
 org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:556)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:982)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:249)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1203)
 Caused by: java.io.IOException: NodeLabelManager doesn't include label = x, 
 please check.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.checkIfLabelInClusterNodeLabels(SchedulerUtils.java:287)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.init(AbstractCSQueue.java:109)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.init(LeafQueue.java:120)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:567)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.parseQueue(CapacityScheduler.java:587)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:462)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:294)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:324)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


[ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233113#comment-14233113
 ] 

Hudson commented on YARN-1156:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1978 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1978/])
YARN-1156. Enhance NodeManager AllocatedGB and AvailableGB metrics for 
aggregation of decimal values. (Contributed by Tsuyoshi OZAWA) (junping_du: rev 
e65b7c5ff6b0c013e510e750fe5cf59acfefea5f)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* hadoop-yarn-project/CHANGES.txt


 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
  Labels: metrics
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.


[ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233112#comment-14233112
 ] 

Hudson commented on YARN-2894:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1978 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1978/])
YARN-2894. Fixed a bug regarding application view acl when RM fails over. 
Contributed by Rohith Sharmaks (jianhe: rev 
392c3aaea8e8f156b76e418157fa347256283c56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java


 When ACL's are enabled, if RM switches then application can not be viewed 
 from web.
 ---

 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: YARN-2894.1.patch, YARN-2894.patch


 Binding aclManager to RMWebApp would cause problem if RM is switched. There 
 could be some validation check may fail.
 I think , we should not bind aclManager for RMWebApp, instead we should get 
 from RM instance.
 In RMWebApp,
 {code}
 if (rm != null) {
   bind(ResourceManager.class).toInstance(rm);
   bind(RMContext.class).toInstance(rm.getRMContext());
   bind(ApplicationACLsManager.class).toInstance(
   rm.getApplicationACLsManager());
   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
 }
 {code}
 and in AppBlock#render below check may fail(Need to test

[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


[ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233128#comment-14233128
 ] 

Hudson commented on YARN-1156:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/])
YARN-1156. Enhance NodeManager AllocatedGB and AvailableGB metrics for 
aggregation of decimal values. (Contributed by Tsuyoshi OZAWA) (junping_du: rev 
e65b7c5ff6b0c013e510e750fe5cf59acfefea5f)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/metrics/NodeManagerMetrics.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/metrics/TestNodeManagerMetrics.java


 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
  Labels: metrics
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2894) When ACL's are enabled, if RM switches then application can not be viewed from web.


[ 
https://issues.apache.org/jira/browse/YARN-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233127#comment-14233127
 ] 

Hudson commented on YARN-2894:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/])
YARN-2894. Fixed a bug regarding application view acl when RM fails over. 
Contributed by Rohith Sharmaks (jianhe: rev 
392c3aaea8e8f156b76e418157fa347256283c56)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesDelegationTokens.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesCapacitySched.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/FairSchedulerAppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodesPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/ClusterMetricsInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesApps.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/AppsBlock.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/MetricsOverviewTable.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/DefaultSchedulerPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesFairScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/UserMetricsInfo.java


 When ACL's are enabled, if RM switches then application can not be viewed 
 from web.
 ---

 Key: YARN-2894
 URL: https://issues.apache.org/jira/browse/YARN-2894
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Rohith
Assignee: Rohith
 Fix For: 2.7.0

 Attachments: YARN-2894.1.patch, YARN-2894.patch


 Binding aclManager to RMWebApp would cause problem if RM is switched. There 
 could be some validation check may fail.
 I think , we should not bind aclManager for RMWebApp, instead we should get 
 from RM instance.
 In RMWebApp,
 {code}
 if (rm != null) {
   bind(ResourceManager.class).toInstance(rm);
   bind(RMContext.class).toInstance(rm.getRMContext());
   bind(ApplicationACLsManager.class).toInstance(
   rm.getApplicationACLsManager());
   bind(QueueACLsManager.class).toInstance(rm.getQueueACLsManager());
 }
 {code}
 and in AppBlock#render below check may fail(Need

[jira] [Commented] (YARN-2136) RMStateStore can explicitly handle store/update events when fenced


[ 
https://issues.apache.org/jira/browse/YARN-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233134#comment-14233134
 ] 

Hudson commented on YARN-2136:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/])
YARN-2136. Changed RMStateStore to ignore store opearations when fenced. 
Contributed by Varun Saxena (jianhe: rev 
52bcefca8bb13d3757009f1f08203e7dca3b1e16)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/TestZKRMStateStore.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/ZKRMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStoreEventType.java


 RMStateStore can explicitly handle store/update events when fenced
 --

 Key: YARN-2136
 URL: https://issues.apache.org/jira/browse/YARN-2136
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Jian He
Assignee: Varun Saxena
 Fix For: 2.7.0

 Attachments: YARN-2136.002.patch, YARN-2136.003.patch, 
 YARN-2136.004.patch, YARN-2136.005.patch, YARN-2136.patch


 RMStateStore can choose to handle/ignore store/update events upfront instead 
 of invoking more ZK operations if state store is at fenced state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2472) yarn-daemons.sh should just call yarn directly


[ 
https://issues.apache.org/jira/browse/YARN-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233131#comment-14233131
 ] 

Hudson commented on YARN-2472:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #24 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/24/])
YARN-2472. yarn-daemons.sh should jsut call yarn directly (Masatake Iwasaki via 
aw) (aw: rev 26319ba0db9907c6254f65cd5b07f72c114d7e85)
* hadoop-yarn-project/CHANGES.txt
* hadoop-yarn-project/hadoop-yarn/bin/yarn-daemons.sh
* hadoop-yarn-project/hadoop-yarn/bin/start-yarn.sh
* hadoop-yarn-project/hadoop-yarn/bin/stop-yarn.sh


 yarn-daemons.sh should just call yarn directly
 --

 Key: YARN-2472
 URL: https://issues.apache.org/jira/browse/YARN-2472
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Masatake Iwasaki
 Fix For: 3.0.0

 Attachments: YARN-2472-1.patch


 There is little-to-no need for it to go through yarn-daemon.sh anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2301) Improve yarn container command

2014-12-03 Thread Naganarasimha G R (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naganarasimha G R updated YARN-2301:

Attachment: YARN-2301.20141203-1.patch

rebasing and updating the patch

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability
 Attachments: YARN-2301.01.patch, YARN-2301.03.patch, 
 YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch


 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps

2014-12-03 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233192#comment-14233192
 ] 

Naganarasimha G R commented on YARN-2874:
-

Hi [~ozawa]  [~kasha],
Thanks for the review and feed back. I put some effort to write the test code 
to reproduce this issue but as more and more sleeps and wait notify was 
required and was not consistently going into deadlock, i thought its not worth 
the effort as the dead lock scenario was easily detectable.
bq. RenewalTimerTask is a method which has a side effect, so the state can be 
invalid after the patch. We need to update the long error handling before 
merging it.
Was not so clear about this statement as i was not able to get which state gets 
invalidated because of the fix and further you ( [~ozawa]) had mentioned 
??Rethinking of this, this is not related to this JIRA.?? , so please if any 
thing more needs to be updated for this issue please inform.

Regarding Sid's comment in MAPREDUCE-5384, If required to be be handled IIUC i 
need to revert my patch and redo as below (correct me if wrong and also inform 
if its req to be fixed in this way)
{quote}
{noformat}
@Override
public void run() {
  if (cancelled) {
return;
  }
  Token? token = dttr.token;
  try {
synchronized (this) {
if (cancelled) {
  return;
}
  requestNewHdfsDelegationTokenIfNeeded(dttr);
  // if the token is not replaced by a new token, renew the token
  if (appTokens.get(dttr.applicationId).contains(dttr)) {
renewToken(dttr);
setTimerForTokenRenewal(dttr);// set the next one
  } else {
  LOG.info(The token was removed already. Token = [ +dttr +]);
  }
}
  } catch (Exception e) {
LOG.error(Exception renewing token + token + . Not rescheduled, e);
removeFailedDelegationToken(dttr);
  }
}
{noformat}
{quote}


 Dead lock in DelegationTokenRenewer which blocks RM to execute any further 
 apps
 -

 Key: YARN-2874
 URL: https://issues.apache.org/jira/browse/YARN-2874
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Blocker
 Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch


 When token renewal fails and the application finishes this dead lock can occur
 Jstack dump :
 {quote}
 Found one Java-level deadlock:
 =
 DelegationTokenRenewer #181865:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
 DelayedTokenCanceller:
   waiting to lock monitor 0x04141718 (object 0xc7eae720, a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
   which is held by Timer-4
 Timer-4:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
  
 Java stack information for the threads listed above:
 ===
 DelegationTokenRenewer #181865:
 at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 DelayedTokenCanceller:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
 - waiting to lock 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at

[jira] [Commented] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster


[ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233199#comment-14233199
 ] 

Junping Du commented on YARN-2892:
--

+1. Patch looks good. Will commit it shortly.

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2892) Unable to get AMRMToken in unmanaged AM when using a secure cluster


 [ 
https://issues.apache.org/jira/browse/YARN-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-2892:
-
Hadoop Flags: Reviewed

 Unable to get AMRMToken in unmanaged AM when using a secure cluster
 ---

 Key: YARN-2892
 URL: https://issues.apache.org/jira/browse/YARN-2892
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Sevada Abraamyan
Assignee: Sevada Abraamyan
 Attachments: YARN-2892.patch, YARN-2892.patch, YARN-2892.patch


 An AMRMToken is retrieved from the ApplicationReport by the YarnClient. 
 When the RM creates the ApplicationReport and sends it back to the client it 
 makes a simple security check whether it should include the AMRMToken in the 
 report (See createAndGetApplicationReport in RMAppImpl).This security check 
 verifies that the user who submitted the original application is the same 
 user who is requesting the ApplicationReport. If they are indeed the same 
 user then it includes the AMRMToken, otherwise it does not include it.
 The problem arises from the fact that when an application is submitted, the 
 RM  saves the short username of the user who created the application (See 
 submitApplication in ClientRmService). Afterwards when the ApplicationReport 
 is requested, the system tries to match the full username of the requester 
 against the previously stored short username. 
 In a secure cluster using Kerberos this check fails because the principle is 
 stripped from the username when we request a short username. So for example 
 the short username might be Foo whereas the full username is 
 f...@company.com
 Note: A very similar problem has been previously reported 
 ([Yarn-2232|https://issues.apache.org/jira/browse/YARN-2232])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2056) Disable preemption at Queue level

2014-12-03 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233205#comment-14233205
 ] 

Jason Lowe commented on YARN-2056:
--

Last call for comments, as I'm planning to commit by the end of this week.

 Disable preemption at Queue level
 -

 Key: YARN-2056
 URL: https://issues.apache.org/jira/browse/YARN-2056
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Mayank Bansal
Assignee: Eric Payne
 Attachments: YARN-2056.201408202039.txt, YARN-2056.201408260128.txt, 
 YARN-2056.201408310117.txt, YARN-2056.201409022208.txt, 
 YARN-2056.201409181916.txt, YARN-2056.201409210049.txt, 
 YARN-2056.201409232329.txt, YARN-2056.201409242210.txt, 
 YARN-2056.201410132225.txt, YARN-2056.201410141330.txt, 
 YARN-2056.201410232244.txt, YARN-2056.201410311746.txt, 
 YARN-2056.201411041635.txt, YARN-2056.201411072153.txt, 
 YARN-2056.201411122305.txt, YARN-2056.201411132215.txt, 
 YARN-2056.201411142002.txt


 We need to be able to disable preemption at individual queue level



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2728) Support for disabling the Centralized NodeLabel validation in Distributed Node Label Configuration setup

2014-12-03 Thread Naganarasimha G R (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233233#comment-14233233
 ] 

Naganarasimha G R commented on YARN-2728:
-

Hi [~wangda],
In the leiu of earlier review 
[comment1|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14169984page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14169984]
  
[comment2|https://issues.apache.org/jira/browse/YARN-2495?focusedCommentId=14169984page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14169984]
 in Yarn-2495, 
{quote}
 Now if user want to leverage change of capacity scheduler, user MUST specify 
1) labels can be accessed by the queue and 2) proportion of resource can be 
accessed by a queue of each label.
Back to the central node label validation discussion, without this, we cannot 
get capacity scheduler work for now. (user cannot specify capacity for a 
unknown node-label for a queue, etc.). 
{quote}
I feel we can keep the design same and we can have a configuration flag based 
on which we can decide to do the following
# Disable(/throw exception) in CommonNodeLabelsManager.addToCluserNodeLabels  
removeFromClusterNodeLabels (so that Cluster Node labels are not taken from 
REST or CLI)
# Support protected method in CommonNodeLabelsManager which updates the label 
mgr with new labels(as cluster node labels) and invoke it from 
CommonNodeLabelsManager.addLabelsToNode 

By doing this, we will have the flexibility to enable or disable this 
centralized valid cluster node labels functionality in both centralized and 
distributed Node Labels configuration.

 Support for disabling the Centralized NodeLabel validation in Distributed 
 Node Label Configuration setup
 

 Key: YARN-2728
 URL: https://issues.apache.org/jira/browse/YARN-2728
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, resourcemanager
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R

 Currently without Central List of Valid Labels, Capacity scheduler will not 
 be able to work (user cannot specify capacity for a unknown node-label for a 
 queue, etc.). But without disabling the central label validation, Distributed 
 Node Label configuration  feature is not complete. so we need to support this 
 feature



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-03 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233234#comment-14233234
 ] 

Zhijie Shen commented on YARN-2837:
---

bq. Maybe we want to make the version control logic a unified interface in 
future?

I think it's a good suggestion, but how about leaving code refactoring 
separately? In addition to the timeline server, other components have the state 
store that is built on top of leveldb, and have the similar version related 
code. We can do one-pass refactoring, make all leveldb store impls share the 
common code. Let's file a Jira for it. 

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-03 Thread Craig Welch (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233244#comment-14233244
]

Craig Welch commented on YARN-2637:
---

First, the easy parts :-)

bq. typo for manualy

fixed

bq. usedAMResources is not used by sub-class, so suggest to replace it with
private

done

bq. Exception messages here should be more meaningful than c1, or c2.

yup - fixed

bq. The log level here should be info or warn level rather than debug level. In
most cases, LOG.debug() should be under block of LOG.isDebugEnabled().

So, I had made this debug rather than something higher because I'm not sure we
always care, and it doesn't represent a failure case - this is normal/expected
case, and other similar cases for not starting the app don't log at all. But,
I can see that it will be helpful to know this, and I don't think that it will
result in excessive logging - so I went ahead and made it an info level,
sound good? BTW, the isXYZenabled idiom is to save the cost of evaluating
the argument construction for the log message as these can be very expensive,
but for cheap cases like this (a string literal) it's not necessary as the only
cost is going to be the same evaluation for logging which will happen during
the call

Now for the more complicated one:

bq. Looks like maxAMResourcePerQueuePercent is a allowed percent for AM
resource in each queue. So we may should calculate amLimit per queue rather
than aggregate all applications together.

So, yes and no - the current behavior actually takes the maxAM... which is set
globally and it apportions it out based on the queue's baseline share of the
cluster - so if the maxam was say, 10%, and a given queue had 50% of the
cluster, it would have an effective maxampercent value of 5% (it's translated
into how many apps can I have running based on the minallocation of the
cluster rather than actual am usage - which is the problem which prompted the
fix - but the important thing to get here is the way the overall maxampercent
is apportioned out to the queues) There is also the option to override on a
per queue basis, so that, in the above scenario, if you didn't like the queue
getting the 5% based on the overall process, but you were happy with how other
queues were working using the config, you could just override for the given
queue.

When I tried to translate this into something which was actually paying
attention to the real usage of the ams, two approaches seemed reasonable:

1. Just have a global used am resource value, use the global am percent
everywhere (not apportioned) - this way the total cluster level effect is what
we want - in this case, the subdivision of the amresource percent value is
replaced with a total summing of the used resource amongst the queues. You can
still override for a given queue if you want this queue to be able to go
higher, which has the effective result of allowing one queue to go higher than
the others, this could starve other queues (bad) but that was already possible
with the other approach, albeit in a different way (when the cluster came to be
filled with AM's from one particular queue.).

2. We could subdivide the global maxampercent based on the queue share of the
baseline (as before) and then have a per-queue amresource percent (and amused)
which are evaluated - this would not be a difficult change from the current
approach, but I think it is problematic for the reason below

The main reason I took approach number one over two is that I was concerned
that with a complex queue structure where there was a reasonable level of
subdivision in a smallish cluster you could end up with a queue which can
effectively never start anything because the final value is too small to ever
be able to start one of the larger AM's we have these days. By sharing it
globally this is less likely to happen because that unused am resource
allocated out to other queues which have a larger share of the cluster is not
potentially sitting idle while leaf queue a.b.c has a derived maxampercent of
say 2%, which translates into 512mb, and so can never start an application
master which needs 1G (even though, globally, there's more than enough
ampercent to do so). It's the this queue can never start an am over x size
that concerns me. There are other possible ways to handle this with option 2,
but I'm concerned that they would add complexity to the behavior and change the
behavior more than is needed to correct the defect.

[~djp] Make sense? Thoughts? I may take a go at option 2 so we can evaluate
it, but I'm concerned about the small cluster/too much subdivision scenario
being problematic.

maximum-am-resource-percent could be violated when resource of AM is
minimumAllocation

Key:

[jira] [Commented] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-12-03 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233294#comment-14233294
 ] 

Li Lu commented on YARN-2837:
-

Agree to [~zjshen]'s suggestion. Let's do that in a separate Jira. I'd +1 this 
patch, and maybe some committers would like to take a look at it? 

 Timeline server needs to recover the timeline DT when restarting
 

 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2837.1.patch, YARN-2837.2.patch, YARN-2837.3.patch


 Timeline server needs to recover the stateful information when restarting as 
 RM/NM/JHS does now. So far the stateful information only includes the 
 timeline DT. Without recovery, the timeline DT of the existing YARN apps is 
 not long valid, and cannot be renewed any more after the timeline server is 
 restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233323#comment-14233323
 ] 

Wangda Tan commented on YARN-2637:
--

[~cwelch],
I think option#2 makes more sense to me, since each allocation will check 
queue's capcity limit only. IIUC. option #1 could lead to some queues all are 
occupied by AM, which is why we introduced the max-am-resource parameter.

For option#2, we can allow user run at least one AM in spite of max am resource 
to avoid the problem mentioned. In a real world cluster, capacity of queue 
should be  maximum size of container we can launch. Do you agree?

Thanks,
Wangda

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.2.patch, YARN-2637.6.patch, 
 YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

2014-12-03 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233341#comment-14233341
 ] 

Bikas Saha commented on YARN-2139:
--

So to be clear, currently vdisks is counting the number of physical drives 
present on the box.

Something to keep in mind would be whether this also entails a change in the NM 
policy of providing a directly on every local dir (which typically maps to 
every disk) to every task. And tasks are free to choose one or more of those 
dirs (disks) to write to. This puts the spinning disk head under contention and 
affects performance of all writers on that disk because seeks are expensive. 
The thumb rule tends to be to allocate as many number of tasks to a machine as 
the number of disks (maybe 2x) so as to keep this seek cost low. Should we 
consider evaluating a change in this policy that gives a container 1 local dir 
to a container with 1 vdisk. This way for a machine with 6 disks (and 6 vdisks) 
would have 6 tasks running, each with their own dedicated disk. Off hand its 
hard to say how this would compare with all 6 disks allocated to all 6 tasks 
and letting cgroups enforce sharing. If multiple tasks end up choosing the same 
disk for their writes, then they may not end up getting the allocation that 
they thought they would get.

 [Umbrella] Support for Disk as a Resource in YARN 
 --

 Key: YARN-2139
 URL: https://issues.apache.org/jira/browse/YARN-2139
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
 Attachments: Disk_IO_Isolation_Scheduling_3.pdf, 
 Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf, 
 YARN-2139-prototype-2.patch, YARN-2139-prototype.patch


 YARN should consider disk as another resource for (1) scheduling tasks on 
 nodes, (2) isolation at runtime, (3) spindle locality. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2437) start-yarn.sh/stop-yarn should give info

2014-12-03 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2437:
--

Assignee: Varun Saxena

 start-yarn.sh/stop-yarn should give info
 

 Key: YARN-2437
 URL: https://issues.apache.org/jira/browse/YARN-2437
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: scripts
Reporter: Allen Wittenauer
Assignee: Varun Saxena
  Labels: newbie

 With the merger and cleanup of the daemon launch code, yarn-daemons.sh no 
 longer prints Starting information.  This should be made more of an analog 
 of start-dfs.sh/stop-dfs.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2903) Timeline server span receiver for htrace traces

2014-12-03 Thread Billie Rinaldi (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billie Rinaldi updated YARN-2903:
-
Attachment: timelinespanreceiver design 1.pdf

 Timeline server span receiver for htrace traces
 ---

 Key: YARN-2903
 URL: https://issues.apache.org/jira/browse/YARN-2903
 Project: Hadoop YARN
  Issue Type: Task
  Components: timelineserver
Reporter: Billie Rinaldi
Assignee: Billie Rinaldi
 Attachments: timelinespanreceiver design 1.pdf


 HDFS is tracing using htrace now, as are other applications including HBase 
 and Accumulo.  It would be a nice feature if we enabled writing traces to the 
 timeline server.  I envision an htrace SpanReceiver implementation that uses 
 the TimelineClient to store tracing data.  The htrace API may end up being a 
 more convenient way to instrument applications to store timeline data in the 
 timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2847) Linux native container executor segfaults if default banned user detected

2014-12-03 Thread chang li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2847:
---
Attachment: yarn2847.patch

 Linux native container executor segfaults if default banned user detected
 -

 Key: YARN-2847
 URL: https://issues.apache.org/jira/browse/YARN-2847
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2847.patch, yarn2847notest.patch


 The check_user function in container-executor.c can cause a segmentation 
 fault if banned.users is not provided but the user is detected as one of the 
 default users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2847) Linux native container executor segfaults if default banned user detected

2014-12-03 Thread chang li (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chang li updated YARN-2847:
---
Attachment: yarn2847.patch

latest patch, previous one has a minor comment error

 Linux native container executor segfaults if default banned user detected
 -

 Key: YARN-2847
 URL: https://issues.apache.org/jira/browse/YARN-2847
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2847.patch, yarn2847.patch, yarn2847notest.patch


 The check_user function in container-executor.c can cause a segmentation 
 fault if banned.users is not provided but the user is detected as one of the 
 default users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2428) LCE default banned user list should have yarn

2014-12-03 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-2428:
---
Attachment: YARN-2428.patch

 LCE default banned user list should have yarn
 -

 Key: YARN-2428
 URL: https://issues.apache.org/jira/browse/YARN-2428
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Allen Wittenauer
Assignee: Varun Saxena
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2428.patch


 When task-controller was retrofitted to YARN, the default banned user list 
 didn't add yarn.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2428) LCE default banned user list should have yarn

2014-12-03 Thread Varun Saxena (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-2428:
--

Assignee: Varun Saxena

 LCE default banned user list should have yarn
 -

 Key: YARN-2428
 URL: https://issues.apache.org/jira/browse/YARN-2428
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Allen Wittenauer
Assignee: Varun Saxena
Priority: Trivial
  Labels: newbie
 Attachments: YARN-2428.patch


 When task-controller was retrofitted to YARN, the default banned user list 
 didn't add yarn.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2301) Improve yarn container command

2014-12-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233486#comment-14233486
 ] 

Hadoop QA commented on YARN-2301:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12684920/YARN-2301.20141203-1.patch
  against trunk revision 03ab24a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.client.TestResourceTrackerOnHA
  
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5987//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5987//console

This message is automatically generated.

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability
 Attachments: YARN-2301.01.patch, YARN-2301.03.patch, 
 YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch


 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2919) Potential race between renew and cancel in DelegationTokenRenwer

Karthik Kambatla created YARN-2919:
--

 Summary: Potential race between renew and cancel in 
DelegationTokenRenwer 
 Key: YARN-2919
 URL: https://issues.apache.org/jira/browse/YARN-2919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: security
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Priority: Critical


YARN-2874 fixes a deadlock in DelegationTokenRenewer, but there is still a race 
because of which a renewal in flight isn't interrupted by a cancel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2637) maximum-am-resource-percent could be violated when resource of AM is minimumAllocation

2014-12-03 Thread Craig Welch (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233530#comment-14233530
 ] 

Craig Welch commented on YARN-2637:
---


Hmmm, [~leftnoteasy] option 1 does have the possible issue you describe, and 
the issue with possibly starving all other queues if one queue has the am 
percent set higher than the others I mentioned above.  The approach of only 
enforcing the limit if at least one application is running was the approach I 
was thinking of if we went with 2 - the other being to not add the new app in 
when doing the check (so it's only retroactive to what has started), but I like 
the former better as it will reduce the overage as much as possible.  
Obviously, either approach has the potential to allow things to exceed the 
maxampercent if there are a large number of queues, but there are tradeoffs 
either way, it's probably a smaller risk... I'll see about a patch for approach 
2.

 maximum-am-resource-percent could be violated when resource of AM is  
 minimumAllocation
 

 Key: YARN-2637
 URL: https://issues.apache.org/jira/browse/YARN-2637
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Wangda Tan
Assignee: Craig Welch
Priority: Critical
 Attachments: YARN-2637.0.patch, YARN-2637.1.patch, 
 YARN-2637.12.patch, YARN-2637.13.patch, YARN-2637.2.patch, YARN-2637.6.patch, 
 YARN-2637.7.patch, YARN-2637.9.patch


 Currently, number of AM in leaf queue will be calculated in following way:
 {code}
 max_am_resource = queue_max_capacity * maximum_am_resource_percent
 #max_am_number = max_am_resource / minimum_allocation
 #max_am_number_for_each_user = #max_am_number * userlimit * userlimit_factor
 {code}
 And when submit new application to RM, it will check if an app can be 
 activated in following way:
 {code}
 for (IteratorFiCaSchedulerApp i=pendingApplications.iterator(); 
  i.hasNext(); ) {
   FiCaSchedulerApp application = i.next();
   
   // Check queue limit
   if (getNumActiveApplications() = getMaximumActiveApplications()) {
 break;
   }
   
   // Check user limit
   User user = getUser(application.getUser());
   if (user.getActiveApplications()  
 getMaximumActiveApplicationsPerUser()) {
 user.activateApplication();
 activeApplications.add(application);
 i.remove();
 LOG.info(Application  + application.getApplicationId() +
  from user:  + application.getUser() + 
  activated in queue:  + getQueueName());
   }
 }
 {code}
 An example is,
 If a queue has capacity = 1G, max_am_resource_percent  = 0.2, the maximum 
 resource that AM can use is 200M, assuming minimum_allocation=1M, #am can be 
 launched is 200, and if user uses 5M for each AM ( minimum_allocation). All 
 apps can still be activated, and it will occupy all resource of a queue 
 instead of only a max_am_resource_percent of a queue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps


[ 
https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233543#comment-14233543
 ] 

Tsuyoshi OZAWA commented on YARN-2874:
--

[~Naganarasimha], never mind, your patch looks good to me. +1

 Dead lock in DelegationTokenRenewer which blocks RM to execute any further 
 apps
 -

 Key: YARN-2874
 URL: https://issues.apache.org/jira/browse/YARN-2874
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Blocker
 Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch


 When token renewal fails and the application finishes this dead lock can occur
 Jstack dump :
 {quote}
 Found one Java-level deadlock:
 =
 DelegationTokenRenewer #181865:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
 DelayedTokenCanceller:
   waiting to lock monitor 0x04141718 (object 0xc7eae720, a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
   which is held by Timer-4
 Timer-4:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
  
 Java stack information for the threads listed above:
 ===
 DelegationTokenRenewer #181865:
 at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 DelayedTokenCanceller:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
 - waiting to lock 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558)
 - locked 0xc18a9998 (a java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599)
 at java.lang.Thread.run(Thread.java:745)
 Timer-4:
 at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437)
 - locked 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
  
 Found 1 deadlock.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps


[ 
https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233558#comment-14233558
 ] 

Karthik Kambatla commented on YARN-2874:


Checking this in. 

 Dead lock in DelegationTokenRenewer which blocks RM to execute any further 
 apps
 -

 Key: YARN-2874
 URL: https://issues.apache.org/jira/browse/YARN-2874
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Blocker
 Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch


 When token renewal fails and the application finishes this dead lock can occur
 Jstack dump :
 {quote}
 Found one Java-level deadlock:
 =
 DelegationTokenRenewer #181865:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
 DelayedTokenCanceller:
   waiting to lock monitor 0x04141718 (object 0xc7eae720, a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
   which is held by Timer-4
 Timer-4:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
  
 Java stack information for the threads listed above:
 ===
 DelegationTokenRenewer #181865:
 at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 DelayedTokenCanceller:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
 - waiting to lock 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558)
 - locked 0xc18a9998 (a java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599)
 at java.lang.Thread.run(Thread.java:745)
 Timer-4:
 at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437)
 - locked 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
  
 Found 1 deadlock.
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2891) Failed Container Executor does not provide a clear error message


[ 
https://issues.apache.org/jira/browse/YARN-2891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233577#comment-14233577
 ] 

Hudson commented on YARN-2891:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6645 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6645/])
YARN-2891. Failed Container Executor does not provide a clear error message. 
Contributed by Dustin Cote. (harsh) (harsh: rev 
a31e0164912236630c485e5aeb908b43e3a67c61)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c


 Failed Container Executor does not provide a clear error message
 

 Key: YARN-2891
 URL: https://issues.apache.org/jira/browse/YARN-2891
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.5.1
 Environment: any
Reporter: Dustin Cote
Assignee: Dustin Cote
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2891-1.patch


 When checking access to directories, the container executor does not provide 
 clear information on which directory actually could not be accessed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2874) Dead lock in DelegationTokenRenewer which blocks RM to execute any further apps


[ 
https://issues.apache.org/jira/browse/YARN-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233575#comment-14233575
 ] 

Hudson commented on YARN-2874:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6645 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6645/])
YARN-2874. Dead lock in DelegationTokenRenewer which blocks RM to execute any 
further apps. (Naganarasimha G R via kasha) (kasha: rev 
799353e2c7db5af6e40e3521439b5c8a3c5c6a51)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/DelegationTokenRenewer.java


 Dead lock in DelegationTokenRenewer which blocks RM to execute any further 
 apps
 -

 Key: YARN-2874
 URL: https://issues.apache.org/jira/browse/YARN-2874
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.6.0, 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-2874.20141118-1.patch, YARN-2874.20141118-2.patch


 When token renewal fails and the application finishes this dead lock can occur
 Jstack dump :
 {quote}
 Found one Java-level deadlock:
 =
 DelegationTokenRenewer #181865:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
 DelayedTokenCanceller:
   waiting to lock monitor 0x04141718 (object 0xc7eae720, a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask),
   which is held by Timer-4
 Timer-4:
   waiting to lock monitor 0x00900918 (object 0xc18a9998, a 
 java.util.Collections$SynchronizedSet),
   which is held by DelayedTokenCanceller
  
 Java stack information for the threads listed above:
 ===
 DelegationTokenRenewer #181865:
 at java.util.Collections$SynchronizedCollection.add(Collections.java:1636)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.addTokenToList(DelegationTokenRenewer.java:322)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:398)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$500(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 DelayedTokenCanceller:
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.cancel(DelegationTokenRenewer.java:443)
 - waiting to lock 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeApplicationFromRenewal(DelegationTokenRenewer.java:558)
 - locked 0xc18a9998 (a java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$300(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelayedTokenRemovalRunnable.run(DelegationTokenRenewer.java:599)
 at java.lang.Thread.run(Thread.java:745)
 Timer-4:
 at java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
 - waiting to lock 0xc18a9998 (a 
 java.util.Collections$SynchronizedSet)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.removeFailedDelegationToken(DelegationTokenRenewer.java:503)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$100(DelegationTokenRenewer.java:70)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask.run(DelegationTokenRenewer.java:437)
 - locked 0xc7eae720 (a 
 org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$RenewalTimerTask)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at

[jira] [Commented] (YARN-1156) Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of decimal values


[ 
https://issues.apache.org/jira/browse/YARN-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233586#comment-14233586
 ] 

Tsuyoshi OZAWA commented on YARN-1156:
--

Thanks for committing and reviewing, Junping!

 Enhance NodeManager AllocatedGB and AvailableGB metrics for aggregation of 
 decimal values
 -

 Key: YARN-1156
 URL: https://issues.apache.org/jira/browse/YARN-1156
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.1.0-beta
Reporter: Akira AJISAKA
Assignee: Tsuyoshi OZAWA
  Labels: metrics
 Fix For: 2.7.0

 Attachments: YARN-1156.1.patch, YARN-1156.2.patch, YARN-1156.3.patch, 
 YARN-1156.4.patch, YARN-1156.5.patch


 AllocatedGB and AvailableGB metrics are now integer type. If there are four 
 times 500MB memory allocation to container, AllocatedGB is incremented four 
 times by {{(int)500/1024}}, which means 0. That is, the memory size allocated 
 is actually 2000MB, but the metrics shows 0GB. Let's use float type for these 
 metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2139) [Umbrella] Support for Disk as a Resource in YARN

[
https://issues.apache.org/jira/browse/YARN-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233605#comment-14233605
]

Karthik Kambatla commented on YARN-2139:

bq. currently vdisks is counting the number of physical drives present on the
box.
We see vdisks as a multiple of the number of physical disks on the box. Again,
it is just one of the ways, and we can add more ways to share disk resources in
the future.

bq. Should we consider evaluating a change in this policy that gives a
container 1 local dir to a container with 1 vdisk. This way for a machine with
6 disks (and 6 vdisks) would have 6 tasks running, each with their own
dedicated disk.
Good point. We were thinking of giving the AM the option to choose the amount
of disk IO parallelism at the time of launching the container, as part of the
spindle locality work. I see AMs wanting to either (1) pick a single local
directory for guaranteed performance or (2) stripe accesses across multiple
disks for potentially higher throughput based on other work on the node.

Initially, we could provide a global config for all containers - vdisks to span
fewest or most disks.

[Umbrella] Support for Disk as a Resource in YARN
--

Key: YARN-2139
URL: https://issues.apache.org/jira/browse/YARN-2139
Project: Hadoop YARN
Issue Type: New Feature
Reporter: Wei Yan
Attachments: Disk_IO_Isolation_Scheduling_3.pdf,
Disk_IO_Scheduling_Design_1.pdf, Disk_IO_Scheduling_Design_2.pdf,
YARN-2139-prototype-2.patch, YARN-2139-prototype.patch

YARN should consider disk as another resource for (1) scheduling tasks on
nodes, (2) isolation at runtime, (3) spindle locality.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2847) Linux native container executor segfaults if default banned user detected

2014-12-03 Thread Wei Yan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233615#comment-14233615
 ] 

Wei Yan commented on YARN-2847:
---

Thanks for the fix, [~lichangleo]. There are some unnecessary changes in the 
latest patch, the blanks.
And for the testcase, do we really need the testcase for this fix, given that 
the testcase requires mapred user.

 Linux native container executor segfaults if default banned user detected
 -

 Key: YARN-2847
 URL: https://issues.apache.org/jira/browse/YARN-2847
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.5.0
Reporter: Jason Lowe
Assignee: chang li
 Attachments: yarn2847.patch, yarn2847.patch, yarn2847notest.patch


 The check_user function in container-executor.c can cause a segmentation 
 fault if banned.users is not provided but the user is detected as one of the 
 default users.  In that scenario it will call free_values on a NULL pointer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()


[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233622#comment-14233622
 ] 

Tsuyoshi OZAWA commented on YARN-2914:
--

[~varun_saxena], Thanks for your contribution. I think we need to init the 
singleton object with configuration. On the other hand, getInstance() doesn't 
take configuration as an argument. This semantic gap prevents us from calling 
initSingleton inside getInstance.

Comments:
* We should also take a lock of Singleton.INSTANCE in the method initSingleton.

 Potential race condition in ClientSCMMetrics#getInstance()
 --

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233630#comment-14233630
 ] 

Carlo Curino commented on YARN-2664:


[~mazzu] thanks for the update. Regarding the release audit we should:
 # Make sure never to add apache license to any file you did not write yourself 
(please confirm) 
 # Add entries in the LICENSE.txt and NOTICE.txt files to disclaim we are using 
(d3, nvd3, underscore). Do you need all three?
   I am uploading what I think are the needed bits ([~jghoman], can you double 
check this? I followed your advise, but I'd like a double check)

I will be looking at the code more closely next.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()


[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233632#comment-14233632
 ] 

Tsuyoshi OZAWA commented on YARN-2914:
--

I found that the configuration which Singleton#init receives is never used. We 
can call init inside getInstance by passing null to initSingleton or changing 
the signature of initSingleton not to receive an object of configuration. Do 
you mind updating?

 Potential race condition in ClientSCMMetrics#getInstance()
 --

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.


 [ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2664:
---
Attachment: legal.patch

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2495) Allow admin specify labels from each NM (Distributed configuration)


[ 
https://issues.apache.org/jira/browse/YARN-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233644#comment-14233644
 ] 

Wangda Tan commented on YARN-2495:
--

[~Naganarasimha],
Since size of the patch grows, and it will be hard for new people to review. I 
suggest to move conf-based node label provider implementation to a separated 
ticket under YARN-2492?
And update title of this ticket accordingly.

Thanks,
Wangda

 Allow admin specify labels from each NM (Distributed configuration)
 ---

 Key: YARN-2495
 URL: https://issues.apache.org/jira/browse/YARN-2495
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Naganarasimha G R
 Attachments: YARN-2495.20141023-1.patch, YARN-2495.20141024-1.patch, 
 YARN-2495.20141030-1.patch, YARN-2495.20141031-1.patch, 
 YARN-2495.20141119-1.patch, YARN-2495.20141126-1.patch, 
 YARN-2495_20141022.1.patch


 Target of this JIRA is to allow admin specify labels in each NM, this covers
 - User can set labels in each NM (by setting yarn-site.xml or using script 
 suggested by [~aw])
 - NM will send labels to RM via ResourceTracker API
 - RM will set labels in NodeLabelManager when NM register/update labels



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233658#comment-14233658
 ] 

Carlo Curino commented on YARN-2664:


[~mazzu], the patch looks good, I will give you a bunch of code-level comments 
to polish it a little further:

 # in YarnWebParams.java: can we name those parameter (JSON_USER, 
JSON_RES_NAME, JSON_FROM, JSON_TO) something more descriptive? Like PLAN_*? 
 # is graph.js your code? If so format it a little more if you can (there are 
some very long lines). Also no need for declaring this is related to YARN-2664 
in the header.
 # in DataPage.createJSON. Shall we make also a null check for 
getAllReservations() ? or are we sure it is never null?
 # in NavBlock would be good to add the Planner link only if reservation are 
enabled (look in YarnConfiguration for the switch).
 
I am now try to set this up in a small cluster, and see how it looks/functions. 

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-12-03 Thread Matteo Mazzucchelli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233657#comment-14233657
 ] 

Matteo Mazzucchelli commented on YARN-2664:
---

bq.  Make sure never to add apache license to any file you did not write 
yourself (please confirm)
I added the apache license only into graph.js, file that i wrote.

\\
bq. Add entries in the LICENSE.txt and NOTICE.txt files to disclaim we are 
using (d3, nvd3, underscore). Do you need all three?
Yes. d3 is the basic library, nvd3 is an extension with some improvements and 
underscore provides useful functions.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233660#comment-14233660
 ] 

Carlo Curino commented on YARN-2664:


Cool. Please include the changes I have in the legal.patch into your next 
patch, and address my other comments if you can.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed

Wangda Tan created YARN-2920:


 Summary: CapacityScheduler should be notified when labels on nodes 
changed
 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wangda Tan
Assignee: Wangda Tan


Currently, labels on nodes changes will only be handled by RMNodeLabelsManager, 
but that is not enough upon labels on nodes changes:
- Scheduler should be able to do take actions to running containers. (Like 
kill/preempt/do-nothing)
- Used / available capacity in scheduler should be updated for future planning.

We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed


 [ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2920:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan

 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature


[ 
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233672#comment-14233672
 ] 

Tsuyoshi OZAWA commented on YARN-2800:
--

[~wangda] thanks for your update! Minor nits:

{code}
+  public static final String NODE_LABELS_NOT_ENABLED_ERR = Node labels not 
+  + enabled, you cannot make any changes on node labels, you can set 
+  + YarnConfiguration.NODE_LABELS_ENABLED
+  +  to true to enable this feature, please reference to user guide.;
{code}

I think we should simplify the error message. How about fixing like this?

{code}
Label-based scheduling is disabled. Please check  + 
YarnConfiguration.NODE_LABELS_ENABLED;
{code}

 Remove MemoryNodeLabelsStore and add a way to enable/disable node labels 
 feature
 

 Key: YARN-2800
 URL: https://issues.apache.org/jira/browse/YARN-2800
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch, 
 YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch, 
 YARN-2800-20141119-1.patch


 In the past, we have a MemoryNodeLabelStore, mostly for user to try this 
 feature without configuring where to store node labels on file system. It 
 seems convenient for user to try this, but actually it causes some bad use 
 experience. User may add/remove labels, and edit capacity-scheduler.xml. 
 After RM restart, labels will gone, (we store it in mem). And RM cannot get 
 started if we have some queue uses labels, and the labels don't exist in 
 cluster.
 As what we discussed, we should have an explicitly way to let user specify if 
 he/she wants this feature or not. If node label is disabled, any operations 
 trying to modify/use node labels will throw exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()

2014-12-03 Thread Chris Trezzo (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233677#comment-14233677
 ] 

Chris Trezzo commented on YARN-2914:


Thanks [~ted_yu] and [~varun_saxena] for the find and patch!

Talking with [~sjlee0], we were thinking that it might be most simple to just 
get rid of the init method and the enum all together. We can make it a more 
straightforward singleton pattern with a line like the following:
{noformat}
private static final CSM = create();
{noformat}

The getInstance() method would then just return CSM. It will also be necessary 
to make the ClientSCMMetrics constructor private.

What do you guys think?

As another note, SharedCacheUploaderMetrics also has this bug. So we can either 
change that class as part of this patch or file a separate JIRA.

 Potential race condition in ClientSCMMetrics#getInstance()
 --

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()

2014-12-03 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233679#comment-14233679
 ] 

Sangjin Lee commented on YARN-2914:
---

Just to clarify, 

{code}
private static final ClientSCMMetrics instance = create();
{code}

 Potential race condition in ClientSCMMetrics#getInstance()
 --

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2921) MockRM#waitForState methods can be too slow and flaky

Karthik Kambatla created YARN-2921:
--

 Summary: MockRM#waitForState methods can be too slow and flaky
 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla


MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
second). This leads to slow tests and sometimes failures if the App/AppAttempt 
moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2914) Potential race condition in ClientSCMMetrics#getInstance()


[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233685#comment-14233685
 ] 

Tsuyoshi OZAWA commented on YARN-2914:
--

[~ctrezzo], Thanks for your suggestion! Your idea make sense to me. I 
prototyped it and it seems to work well. +1 for the design. [~varun_saxena], 
could you update a patch on Chris's idea?

 Potential race condition in ClientSCMMetrics#getInstance()
 --

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-03 Thread Mit Desai (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai updated YARN-2900:

Attachment: YARN-2900.patch

[~zjshen], [~jeagles]: Attaching final patch with the fix and unit tests to 
verify it. Can you review?

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()


 [ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2914:
-
Summary: Potential race condition in 
SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()  (was: Potential race 
condition in ClientSCMMetrics#getInstance())

 Potential race condition in 
 SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()
 -

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()


 [ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2914:
-
Description: 
{code}
  public static ClientSCMMetrics getInstance() {
ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
if (topMetrics == null) {
  throw new IllegalStateException(
{code}
getInstance() doesn't hold lock on Singleton.this
This may result in IllegalStateException being thrown prematurely.

[~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of race 
condition.

  was:
{code}
  public static ClientSCMMetrics getInstance() {
ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
if (topMetrics == null) {
  throw new IllegalStateException(
{code}
getInstance() doesn't hold lock on Singleton.this
This may result in IllegalStateException being thrown prematurely.


 Potential race condition in 
 SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()
 -

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.
 [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2914) Potential race condition in SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()


[ 
https://issues.apache.org/jira/browse/YARN-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233689#comment-14233689
 ] 

Tsuyoshi OZAWA commented on YARN-2914:
--

Updated description.

 Potential race condition in 
 SharedCacheUploaderMetrics/ClientSCMMetrics#getInstance()
 -

 Key: YARN-2914
 URL: https://issues.apache.org/jira/browse/YARN-2914
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ted Yu
Assignee: Varun Saxena
Priority: Minor
 Fix For: 2.7.0

 Attachments: YARN-2914.patch


 {code}
   public static ClientSCMMetrics getInstance() {
 ClientSCMMetrics topMetrics = Singleton.INSTANCE.impl;
 if (topMetrics == null) {
   throw new IllegalStateException(
 {code}
 getInstance() doesn't hold lock on Singleton.this
 This may result in IllegalStateException being thrown prematurely.
 [~ctrezzo] reported that SharedCacheUploaderMetrics has also same kind of 
 race condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2664) Improve RM webapp to expose info about reservations.


 [ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-2664:
---
Attachment: screenshot_reservation_UI.pdf

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233690#comment-14233690
 ] 

Carlo Curino commented on YARN-2664:


I am running this on a 35 machine cluster, with a (modified) gridmix generating 
reservations and submitting jobs. 
It looks really nice (I am attaching a screenshot), well done [~mazzu]. 

One simple nice addition would be to show, beside the absolute memory assigned 
to the reservations, something that gives an idea of the overall plan 
utilization.  For example, you can add next to the absolute values on the Y 
axis, some reference to the plan percentage (even just for the highest value).



 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2301) Improve yarn container command

2014-12-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233692#comment-14233692
 ] 

Hadoop QA commented on YARN-2301:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12684920/YARN-2301.20141203-1.patch
  against trunk revision a31e016.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5989//console

This message is automatically generated.

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability
 Attachments: YARN-2301.01.patch, YARN-2301.03.patch, 
 YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch


 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


 [ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-2921:


Assignee: Tsuyoshi OZAWA

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA

 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2301) Improve yarn container command

2014-12-03 Thread Jian He (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233694#comment-14233694
 ] 

Jian He commented on YARN-2301:
---

looks good overall, 
- we do not need to expose the setter in RMContext interface
{{public void setYarnConfiguration(Configuration yarnConfiguration);}}
- changes in TestApplicationClientProtocolOnHA may be not needed. 

 Improve yarn container command
 --

 Key: YARN-2301
 URL: https://issues.apache.org/jira/browse/YARN-2301
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Jian He
Assignee: Naganarasimha G R
  Labels: usability
 Attachments: YARN-2301.01.patch, YARN-2301.03.patch, 
 YARN-2301.20141120-1.patch, YARN-2301.20141203-1.patch, YARN-2303.patch


 While running yarn container -list Application Attempt ID command, some 
 observations:
 1) the scheme (e.g. http/https  ) before LOG-URL is missing
 2) the start-time is printed as milli seconds (e.g. 1405540544844). Better to 
 print as time format.
 3) finish-time is 0 if container is not yet finished. May be N/A
 4) May have an option to run as yarn container -list appId OR  yarn 
 application -list-containers appId also.  
 As attempt Id is not shown on console, this is easier for user to just copy 
 the appId and run it, may  also be useful for container-preserving AM 
 restart. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2664) Improve RM webapp to expose info about reservations.

2014-12-03 Thread Jakob Homan (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233698#comment-14233698
 ] 

Jakob Homan commented on YARN-2664:
---

There's a few extra lines of whitespace, but the actual content looks good to 
me, per what I understand the current requirements to be.

 Improve RM webapp to expose info about reservations.
 

 Key: YARN-2664
 URL: https://issues.apache.org/jira/browse/YARN-2664
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Matteo Mazzucchelli
 Attachments: PlannerPage_screenshot.pdf, YARN-2664.1.patch, 
 YARN-2664.2.patch, YARN-2664.3.patch, YARN-2664.4.patch, YARN-2664.5.patch, 
 YARN-2664.6.patch, YARN-2664.patch, legal.patch, screenshot_reservation_UI.pdf


 YARN-1051 provides a new functionality in the RM to ask for reservation on 
 resources. Exposing this through the webapp GUI is important.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2900) Application (Attempt and Container) Not Found in AHS results in Internal Server Error (500)

2014-12-03 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233702#comment-14233702
 ] 

Zhijie Shen commented on YARN-2900:
---

Will take a look

 Application (Attempt and Container) Not Found in AHS results in Internal 
 Server Error (500)
 ---

 Key: YARN-2900
 URL: https://issues.apache.org/jira/browse/YARN-2900
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Jonathan Eagles
Assignee: Mit Desai
 Attachments: YARN-2900.patch, YARN-2900.patch, YARN-2900.patch, 
 YARN-2900.patch


 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToApplicationReport(ApplicationHistoryManagerImpl.java:128)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getApplication(ApplicationHistoryManagerImpl.java:118)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:222)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices$2.run(WebServices.java:219)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1679)
   at 
 org.apache.hadoop.yarn.server.webapp.WebServices.getApp(WebServices.java:218)
   ... 59 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


 [ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-2921:
-
Attachment: YARN-2921.001.patch

Attaching a first patch.

1. Making polling interval smaller(100msec).
2. Adding waitStateExecutor for polling the state. 
3. Using CountDownLatch.

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2921.001.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2880) Add a test in TestRMRestart to make sure node labels will be recovered if it is enabled

2014-12-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233729#comment-14233729
 ] 

Hadoop QA commented on YARN-2880:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12684800/YARN-2880.2.patch
  against trunk revision a31e016.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5988//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5988//console

This message is automatically generated.

 Add a test in TestRMRestart to make sure node labels will be recovered if it 
 is enabled
 ---

 Key: YARN-2880
 URL: https://issues.apache.org/jira/browse/YARN-2880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Wangda Tan
Assignee: Rohith
 Attachments: 0001-YARN-2880.patch, YARN-2880.1.patch, 
 YARN-2880.1.patch, YARN-2880.2.patch


 As suggested by [~ozawa], 
 [link|https://issues.apache.org/jira/browse/YARN-2800?focusedCommentId=14217569page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14217569].
  We should have a such test to make sure there will be no regression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (YARN-2922) Concurrent Modification Exception in LeafQueue when collecting applications

2014-12-03 Thread Jason Tufo (JIRA)

Jason Tufo created YARN-2922:


 Summary: Concurrent Modification Exception in LeafQueue when 
collecting applications
 Key: YARN-2922
 URL: https://issues.apache.org/jira/browse/YARN-2922
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.5.1
Reporter: Jason Tufo


java.util.ConcurrentModificationException
at 
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233741#comment-14233741
 ] 

Karthik Kambatla commented on YARN-2921:


Are 2 and 3 required here? 

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2921.001.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233745#comment-14233745
 ] 

Tsuyoshi OZAWA commented on YARN-2921:
--

If we use the CountDownLatch, we can understand the timeout value more easily:

{code}
latch.await(40, TimeUnit.SECONDS))
{code}

If I shouldn't do this here, I'll revert it.

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2921.001.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233748#comment-14233748
 ] 

Tsuyoshi OZAWA commented on YARN-2921:
--

About the MockAM#waitForState, should we fix it on this JIRA?

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2921.001.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233751#comment-14233751
 ] 

Karthik Kambatla commented on YARN-2921:


Using the latch would make sense if the threads have any other advantage; 
otherwise, I would leave both out and control time waited through number of 
iterations of the for loop. Fixing MockAM#waitFor here would be good too. 

We should make sure the cumulative wait time stays at least as long as what it 
is now to avoid any test failures. 

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2921.001.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2921) MockRM#waitForState methods can be too slow and flaky


[ 
https://issues.apache.org/jira/browse/YARN-2921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233758#comment-14233758
 ] 

Karthik Kambatla commented on YARN-2921:


Other than the smaller sleep, we should also handle the case where the App or 
AppAttempt enters the required state and then moves to a latter state. e.g. App 
moving to RUNNING state when we are waiting for it to get ACCEPTED. 

 MockRM#waitForState methods can be too slow and flaky
 -

 Key: YARN-2921
 URL: https://issues.apache.org/jira/browse/YARN-2921
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: test
Affects Versions: 2.6.0
Reporter: Karthik Kambatla
Assignee: Tsuyoshi OZAWA
 Attachments: YARN-2921.001.patch


 MockRM#waitForState methods currently sleep for too long (2 seconds and 1 
 second). This leads to slow tests and sometimes failures if the 
 App/AppAttempt moves to another state. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed


 [ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2920:
-
Target Version/s: 2.7.0

 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2920.1.patch


 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2920) CapacityScheduler should be notified when labels on nodes changed


 [ 
https://issues.apache.org/jira/browse/YARN-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2920:
-
Attachment: YARN-2920.1.patch

Attached ver.1 patch for this

 CapacityScheduler should be notified when labels on nodes changed
 -

 Key: YARN-2920
 URL: https://issues.apache.org/jira/browse/YARN-2920
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2920.1.patch


 Currently, labels on nodes changes will only be handled by 
 RMNodeLabelsManager, but that is not enough upon labels on nodes changes:
 - Scheduler should be able to do take actions to running containers. (Like 
 kill/preempt/do-nothing)
 - Used / available capacity in scheduler should be updated for future 
 planning.
 We need add a new event to pass such updates to scheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2800) Remove MemoryNodeLabelsStore and add a way to enable/disable node labels feature

[
https://issues.apache.org/jira/browse/YARN-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wangda Tan updated YARN-2800:
-
Attachment: YARN-2800-20141203-1.patch

[~ozawa], thanks for your comment, make sense to me, updated.

[~jianhe], could you take a look please?

Remove MemoryNodeLabelsStore and add a way to enable/disable node labels
feature

Key: YARN-2800
URL: https://issues.apache.org/jira/browse/YARN-2800
Project: Hadoop YARN
Issue Type: Sub-task
Components: client, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
Attachments: YARN-2800-20141102-1.patch, YARN-2800-20141102-2.patch,
YARN-2800-20141118-1.patch, YARN-2800-20141118-2.patch,
YARN-2800-20141119-1.patch, YARN-2800-20141203-1.patch

In the past, we have a MemoryNodeLabelStore, mostly for user to try this
feature without configuring where to store node labels on file system. It
seems convenient for user to try this, but actually it causes some bad use
experience. User may add/remove labels, and edit capacity-scheduler.xml.
After RM restart, labels will gone, (we store it in mem). And RM cannot get
started if we have some queue uses labels, and the labels don't exist in
cluster.
As what we discussed, we should have an explicitly way to let user specify if
he/she wants this feature or not. If node label is disabled, any operations
trying to modify/use node labels will throw exception.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2762) RMAdminCLI node-labels-related args should be trimmed and checked before sending to RM


[ 
https://issues.apache.org/jira/browse/YARN-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233812#comment-14233812
 ] 

Wangda Tan commented on YARN-2762:
--

Hi [~rohithsharma],
Thanks for working on this, the trimming itself looks good to me, but some 
comments about error message handling

I think we should make the error message more consistent, my suggestion is:
- If no labels specified when adding/removing labels, the message is No 
cluster node-labels are specified
- If no node-to-labels mapping specified when replace labels, the message is 
No node-to-labels mappings are specified

And we should make the two kinds of error message as a pre-defined final field 
of RMAdminCLI.

Thoughts?  

 RMAdminCLI node-labels-related args should be trimmed and checked before 
 sending to RM
 --

 Key: YARN-2762
 URL: https://issues.apache.org/jira/browse/YARN-2762
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Rohith
Assignee: Rohith
Priority: Minor
 Attachments: YARN-2762.1.patch, YARN-2762.2.patch, YARN-2762.patch


 All NodeLabel args validation's are done at server side. The same can be done 
 at RMAdminCLI so that unnecessary RPC calls can be avoided.
 And for the input such as x,y,,z,, no need to add empty string instead can 
 be skipped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2869) CapacityScheduler should trim sub queue names when parse configuration


 [ 
https://issues.apache.org/jira/browse/YARN-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2869:
-
Attachment: YARN-2869-3.patch

Attached ver.3 patch to re-kick Jenkins, not sure which change causes javadocs 
WARNING

 CapacityScheduler should trim sub queue names when parse configuration
 --

 Key: YARN-2869
 URL: https://issues.apache.org/jira/browse/YARN-2869
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler, resourcemanager
Reporter: Wangda Tan
Assignee: Wangda Tan
 Attachments: YARN-2869-1.patch, YARN-2869-2.patch, YARN-2869-3.patch


 Currently, capacity scheduler doesn't trim sub queue name when parsing queue 
 names, for example, the configuration
 {code}
 configuration
  property
  name...root.queues/name
  value a, b  , c/value
  /property
  property
  name...root.b.capacity/name
  value100/value
  /property
   
  ...
 /property
 {code}
 Will fail with error: 
 {code}
 java.lang.IllegalArgumentException: Illegal capacity of -1.0 for queue root. 
 a 
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration.getCapacity(CapacitySchedulerConfiguration.java:332)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.getCapacityFromConf(LeafQueue.java:196)
 
 {code}
 It will try to find a queues with name  a,  b  , and  c, which is 
 apparently wrong, we should do trimming on these sub queue names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (YARN-2189) Admin service for cache manager

2014-12-03 Thread Chris Trezzo (JIRA)

[
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Trezzo updated YARN-2189:
---
Attachment: YARN-2189-trunk-v6.patch

Thanks for the comments [~kasha]. Attached is V6 which addresses most of the
comments. Diff between V5 and V6:
https://github.com/ctrezzo/hadoop/commit/e8d47fb3e8cea03c4f3545571f1b2c9593f0574e

One thing that I didn't change is making SCMAdminProtocolService#checkAcls use
RMServerUtils#verifyAccess. I started to do this, but then realized this would
require the SharedCacheManager package to depend on the ResourceManager
package. I started to move the verifyAccess method to yarn-server-common, but
then realized that it uses the RMAuditLogger.

I could create a slightly more generic verifyAccess method in
yarn-server-common and make both servers use that if you want. Let me know.

Thanks!

Admin service for cache manager
---

Key: YARN-2189
URL: https://issues.apache.org/jira/browse/YARN-2189
Project: Hadoop YARN
Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch,
YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch,
YARN-2189-trunk-v6.patch

Implement the admin service for the shared cache manager. This service is
responsible for handling administrative commands such as manually running a
cleaner task.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (YARN-2922) Concurrent Modification Exception in LeafQueue when collecting applications


 [ 
https://issues.apache.org/jira/browse/YARN-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-2922:


Assignee: Rohith

 Concurrent Modification Exception in LeafQueue when collecting applications
 ---

 Key: YARN-2922
 URL: https://issues.apache.org/jira/browse/YARN-2922
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 2.5.1
Reporter: Jason Tufo
Assignee: Rohith

 java.util.ConcurrentModificationException
 at 
 java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
 at java.util.TreeMap$KeyIterator.next(TreeMap.java:1169)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.collectSchedulerApplications(LeafQueue.java:1618)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getAppsInQueue(CapacityScheduler.java:1119)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getQueueInfo(ClientRMService.java:798)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getQueueInfo(ApplicationClientProtocolPBServiceImpl.java:234)
 at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:333)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
 at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2189) Admin service for cache manager


[ 
https://issues.apache.org/jira/browse/YARN-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233871#comment-14233871
 ] 

Karthik Kambatla commented on YARN-2189:


bq. I could create a slightly more generic verifyAccess method in 
yarn-server-common and make both servers use that if you want. Let me know.
If it is not too much trouble, that would be nice.

Other than that, there is one unused import:
{code}
import org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocol;
{code}

 Admin service for cache manager
 ---

 Key: YARN-2189
 URL: https://issues.apache.org/jira/browse/YARN-2189
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Chris Trezzo
Assignee: Chris Trezzo
 Attachments: YARN-2189-trunk-v1.patch, YARN-2189-trunk-v2.patch, 
 YARN-2189-trunk-v3.patch, YARN-2189-trunk-v4.patch, YARN-2189-trunk-v5.patch, 
 YARN-2189-trunk-v6.patch


 Implement the admin service for the shared cache manager. This service is 
 responsible for handling administrative commands such as manually running a 
 cleaner task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2918) RM starts up fails if accessible-node-labels are configured to queue without cluster lables