[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495798#comment-14495798
 ] 

Hadoop QA commented on YARN-1402:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725487/YARN-1402.2.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7344//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7344//console

This message is automatically generated.

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1402) Related Web UI, CLI changes on exposing client API to check log aggregation status

2015-04-15 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495809#comment-14495809
 ] 

Xuan Gong commented on YARN-1402:
-

-1 core tests is not related

> Related Web UI, CLI changes on exposing client API to check log aggregation 
> status
> --
>
> Key: YARN-1402
> URL: https://issues.apache.org/jira/browse/YARN-1402
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Xuan Gong
>Assignee: Xuan Gong
> Attachments: YARN-1402.1.patch, YARN-1402.2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3404) View the queue name to YARN Application page

2015-04-15 Thread Ryu Kobayashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryu Kobayashi updated YARN-3404:

Attachment: YARN-3404.4.patch

[~jianhe] I see. I have created a patch that uses the 
{{WebAppUtils.getResolvedRMWebAppURLWithScheme}}.

> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
> YARN-3404.4.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14495846#comment-14495846
 ] 

Hadoop QA commented on YARN-3404:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725514/YARN-3404.4.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7345//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7345//console

This message is automatically generated.

> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
> YARN-3404.4.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496014#comment-14496014
 ] 

Hudson commented on YARN-3436:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #164 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/164/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


> Fix URIs in documention of YARN web service REST APIs
> -
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496016#comment-14496016
 ] 

Hudson commented on YARN-3361:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #164 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/164/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java


> CapacityScheduler side change

[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496017#comment-14496017
 ] 

Hudson commented on YARN-3266:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #164 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/164/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java


> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
> YARN-3266.03.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496024#comment-14496024
 ] 

Hudson commented on YARN-3361:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #898 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/898/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java


> CapacityScheduler side changes to support

[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496023#comment-14496023
 ] 

Hudson commented on YARN-3436:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #898 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/898/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


> Fix URIs in documention of YARN web service REST APIs
> -
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496025#comment-14496025
 ] 

Hudson commented on YARN-3266:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #898 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/898/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java


> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
> YARN-3266.03.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496069#comment-14496069
 ] 

Hudson commented on YARN-3361:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2096 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2096/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


> CapacityScheduler side changes to suppo

[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496068#comment-14496068
 ] 

Hudson commented on YARN-3436:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2096 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2096/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


> Fix URIs in documention of YARN web service REST APIs
> -
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496070#comment-14496070
 ] 

Hudson commented on YARN-3266:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2096 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2096/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java


> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
> YARN-3266.03.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496076#comment-14496076
 ] 

Hudson commented on YARN-3361:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #155 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/155/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java


> CapacityScheduler side change

[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496075#comment-14496075
 ] 

Hudson commented on YARN-3436:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #155 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/155/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


> Fix URIs in documention of YARN web service REST APIs
> -
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496077#comment-14496077
 ] 

Hudson commented on YARN-3266:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #155 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/155/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java


> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
> YARN-3266.03.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3476) Nodemanager can fail to delete local logs if log aggregation fails

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496099#comment-14496099
 ] 

Hadoop QA commented on YARN-3476:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724974/0001-YARN-3476.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7346//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7346//console

This message is automatically generated.

> Nodemanager can fail to delete local logs if log aggregation fails
> --
>
> Key: YARN-3476
> URL: https://issues.apache.org/jira/browse/YARN-3476
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation, nodemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Rohith
> Attachments: 0001-YARN-3476.patch
>
>
> If log aggregation encounters an error trying to upload the file then the 
> underlying TFile can throw an illegalstateexception which will bubble up 
> through the top of the thread and prevent the application logs from being 
> deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2268) Disallow formatting the RMStateStore when there is an RM running

2015-04-15 Thread Rohith (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496103#comment-14496103
 ] 

Rohith commented on YARN-2268:
--

I propose the following way to handle disallow state store when RM is running.
For both HA(Active and Standby) and Non-HA, it is possible to get RM state 
using REST API getClusterInfo('ws/v1/cluster/info'). This can be make use for 
identifying RM state. This is independent of any state store implementaions.
In HA, ACTIVE state is checked with all the the RM-Id's in a sequential manner. 
If no ACTIVE state RM is found then format the store otherwise throw an 
exception *ActiveResourceManagerRunningException*.

Cons : Formatting state store when HA is enabled is *Best Effort* basis, there 
would be scenario where RM state can be chagned after one of the RM is checked.

Kindly share your thoughts on this approach..

> Disallow formatting the RMStateStore when there is an RM running
> 
>
> Key: YARN-2268
> URL: https://issues.apache.org/jira/browse/YARN-2268
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Karthik Kambatla
>Assignee: Rohith
>
> YARN-2131 adds a way to format the RMStateStore. However, it can be a problem 
> if we format the store while an RM is actively using it. It would be nice to 
> fail the format if there is an RM running and using this store. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3477) TimelineClientImpl swallows root cause of retry failures

2015-04-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3477:
-
 Target Version/s: 2.7.1
Affects Version/s: (was: 3.0.0)
   2.7.0

> TimelineClientImpl swallows root cause of retry failures
> 
>
> Key: YARN-3477
> URL: https://issues.apache.org/jira/browse/YARN-3477
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: timelineserver
>Affects Versions: 2.7.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>
> If timeline client fails more than the retry count, the original exception is 
> not thrown. Instead some runtime exception is raised saying "retries run out"
> # the failing exception should be rethrown, ideally via 
> NetUtils.wrapException to include URL of the failing endpoing
> # Otherwise, the raised RTE should (a) state that URL and (b) set the 
> original fault as the inner cause



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-04-15 Thread Jason Lowe (JIRA)
Jason Lowe created YARN-3489:


 Summary: RMServerUtils.validateResourceRequests should only obtain 
queue info once
 Key: YARN-3489
 URL: https://issues.apache.org/jira/browse/YARN-3489
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.6.0
Reporter: Jason Lowe


Since the label support was added we now get the queue info for each request 
being validated in SchedulerUtils.validateResourceRequest.  If 
validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
large cluster with lots of varied locality in the requests) then it will get 
the queue info for each request.  Since we build the queue info this generates 
a lot of unnecessary garbage, as the queue isn't changing between requests.  We 
should grab the queue info once and pass it down rather than building it again 
for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3489) RMServerUtils.validateResourceRequests should only obtain queue info once

2015-04-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena reassigned YARN-3489:
--

Assignee: Varun Saxena

> RMServerUtils.validateResourceRequests should only obtain queue info once
> -
>
> Key: YARN-3489
> URL: https://issues.apache.org/jira/browse/YARN-3489
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Varun Saxena
>
> Since the label support was added we now get the queue info for each request 
> being validated in SchedulerUtils.validateResourceRequest.  If 
> validateResourceRequests needs to validate a lot of requests at a time (e.g.: 
> large cluster with lots of varied locality in the requests) then it will get 
> the queue info for each request.  Since we build the queue info this 
> generates a lot of unnecessary garbage, as the queue isn't changing between 
> requests.  We should grab the queue info once and pass it down rather than 
> building it again for each request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3471) Fix timeline client retry

2015-04-15 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated YARN-3471:
-
Affects Version/s: 2.8.0

> Fix timeline client retry
> -
>
> Key: YARN-3471
> URL: https://issues.apache.org/jira/browse/YARN-3471
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Affects Versions: 2.8.0
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3471.1.patch, YARN-3471.2.patch
>
>
> I found that the client retry has some problems:
> 1. The new put methods will retry on all exception, but they should only do 
> it upon ConnectException.
> 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496239#comment-14496239
 ] 

Thomas Graves commented on YARN-3434:
-

So I had considered putting it in the ResourceLimits but ResourceLimits seems 
to be more of a queue level thing to me (not a user level). For instance 
parentQueue passes this into leafQueue. ParentQueue cares nothing about user 
limits.  If you stored it there you would either need to track the user it was 
for or track for all users. ResourceLimits get updated when nodes are added and 
removed.  We don't need to compute a particular user limit when that happens.   
 So it would then be out of date or we change to update it when that happens, 
but that to me is fairly large change and not really needed.

The user limit calculation are lower down and recomputed per user, per 
application, per current request regularly and putting this into the global 
based on how being calculated and used didn't make sense to me. All you would 
be using it for is passing it down to assignContainer and then it would be out 
of date.  If someone else started looking at that value assuming it was up to 
date then it would be wrong (unless of course we started updating it as stated 
above).  But it would only be for a single user, not all users unless again we 
changed to calculate for every user whenever something changed. That seems a 
bit excessive.

You are correct that needToUnreserve could go away.  I started out on 2.6 which 
didn't have our changes and I could have removed it when I added in 
amountNeededUnreserve.  If we were to store it in the global ResourceLimit then 
yes the entire LimitsInfo can go away including shouldContinue as you would 
fall back to use the boolean return from each function.   But again based on my 
above comments I'm not sure ResourceLimit is the correct place to put this.

I just noticed that we are already keeping the userLimit in the User class, 
that would be another option.  But again I think we need to make it clear about 
what it is. This particular check is done per application, per user based on 
the current requested Resource.  The value stored that wouldn't necessarily 
apply to all the users applications since the resource request size could be 
different.  

thoughts or is there something I'm missing about ResourceLimits?

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496305#comment-14496305
 ] 

Hudson commented on YARN-3361:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java


> CapacityScheduler s

[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496304#comment-14496304
 ] 

Hudson commented on YARN-3436:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


> Fix URIs in documention of YARN web service REST APIs
> -
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496306#comment-14496306
 ] 

Hudson commented on YARN-3266:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #165 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/165/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* hadoop-yarn-project/CHANGES.txt


> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
> YARN-3266.03.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3266) RMContext inactiveNodes should have NodeId as map key

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496385#comment-14496385
 ] 

Hudson commented on YARN-3266:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/])
YARN-3266. RMContext#inactiveNodes should have NodeId as map key. Contributed 
by Chengbing Liu (jianhe: rev b46ee1e7a31007985b88072d9af3d97c33a261a7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMActiveServiceContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodes.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMNodeTransitions.java


> RMContext inactiveNodes should have NodeId as map key
> -
>
> Key: YARN-3266
> URL: https://issues.apache.org/jira/browse/YARN-3266
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
> Fix For: 2.8.0
>
> Attachments: YARN-3266.01.patch, YARN-3266.02.patch, 
> YARN-3266.03.patch
>
>
> Under the default NM port configuration, which is 0, we have observed in the 
> current version, "lost nodes" count is greater than the length of the lost 
> node list. This will happen when we consecutively restart the same NM twice:
> * NM started at port 10001
> * NM restarted at port 10002
> * NM restarted at port 10003
> * NM:10001 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=1; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} has 1 element
> * NM:10002 timeout, {{ClusterMetrics#incrNumLostNMs()}}, # lost node=2; 
> {{rmNode.context.getInactiveRMNodes().put(rmNode.nodeId.getHost(), rmNode)}}, 
> {{inactiveNodes}} still has 1 element
> Since we allow multiple NodeManagers on one host (as discussed in YARN-1888), 
> {{inactiveNodes}} should be of type {{ConcurrentMap}}. If 
> this will break the current API, then the key string should include the NM's 
> port as well.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3436) Fix URIs in documention of YARN web service REST APIs

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496383#comment-14496383
 ] 

Hudson commented on YARN-3436:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/])
YARN-3436. Fix URIs in documantion of YARN web service REST APIs. Contributed 
by Bibin A Chundatt. (ozawa: rev 05007b45e58bd9052f503cfb8c17bcfd22a686e3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
* hadoop-yarn-project/CHANGES.txt


> Fix URIs in documention of YARN web service REST APIs
> -
>
> Key: YARN-3436
> URL: https://issues.apache.org/jira/browse/YARN-3436
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: documentation, resourcemanager
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3436.001.patch
>
>
> /docs/current/hadoop-yarn/hadoop-yarn-site/WebServicesIntro.html
> {quote}
> Response Examples
> JSON response with single resource
> HTTP Request: GET 
> http://rmhost.domain:8088/ws/v1/cluster/{color:red}app{color}/application_1324057493980_0001
> Response Status Line: HTTP/1.1 200 OK
> {quote}
> Url should be ws/v1/cluster/{color:red}apps{color} .
> 2 examples on same page are wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3361) CapacityScheduler side changes to support non-exclusive node labels

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496384#comment-14496384
 ] 

Hudson commented on YARN-3361:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2114 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2114/])
YARN-3361. CapacityScheduler side changes to support non-exclusive node labels. 
Contributed by Wangda Tan (jianhe: rev 0fefda645bca935b87b6bb8ca63e6f18340d59f5)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestNodeLabelContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/utils/BuilderUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestChildQueueOrder.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacitySchedulerConfiguration.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/SchedulingMode.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/Application.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


> CapacityScheduler side change

[jira] [Updated] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-15 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated YARN-3448:
--
Attachment: YARN-3448.8.patch

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, 
> YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496490#comment-14496490
 ] 

Zhijie Shen commented on YARN-3051:
---

Hence, regardless the implementation detail, we logically use:

1.  to identify entities that are generated on the same 
cluster.
2.  to identify entities globally across 
clusters.

In terms of compatibility, {{getTimelineEntity(entity type, entity id)}} can 
assume the cluster ID is either the default one or configured in yarn-site.xml.

Does it sound good?

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496503#comment-14496503
 ] 

Sangjin Lee commented on YARN-3051:
---

Yep. That's perfect.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496509#comment-14496509
 ] 

Varun Saxena commented on YARN-3051:


As per the patch I am currently working on, if clusterid does not come in the 
query, it is taken from config. So thats consistent. Although I was assuming 
appid will be part of PK.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496511#comment-14496511
 ] 

Sangjin Lee commented on YARN-3390:
---

{quote}
For putIfAbsent and remove, I don't use template method pattern, but let the 
subclass override the super class method and invoke it inside the override 
implementation, because I'm not sure if we will need pre process or post 
process, and if we only invoke the process when adding a new collector. If 
we're sure about template, I'm okay with the template pattern too.
{quote}
I'm fine with either approach. The main reason I thought of that is I wanted to 
be clear that the base implementation of putIfAbsent() and remove() is 
mandatory (i.e. not optional). Since we control all of it (base and 
subclasses), it might not be such a big deal either way.

> Reuse TimelineCollectorManager for RM
> -
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3390.1.patch
>
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3318) Create Initial OrderingPolicy Framework and FifoOrderingPolicy

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496519#comment-14496519
 ] 

Hudson commented on YARN-3318:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7588 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7588/])
YARN-3318. Create Initial OrderingPolicy Framework and FifoOrderingPolicy. 
(Craig Welch via wangda) (wangda: rev 5004e753322084e42dfda4be1d2db66677f86a1e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/OrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceUsage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/MockSchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/SchedulableEntity.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/AbstractComparatorOrderingPolicy.java
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoOrderingPolicy.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/TestFifoOrderingPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/policy/FifoComparator.java


> Create Initial OrderingPolicy Framework and FifoOrderingPolicy
> --
>
> Key: YARN-3318
> URL: https://issues.apache.org/jira/browse/YARN-3318
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Fix For: 2.8.0
>
> Attachments: YARN-3318.13.patch, YARN-3318.14.patch, 
> YARN-3318.17.patch, YARN-3318.34.patch, YARN-3318.35.patch, 
> YARN-3318.36.patch, YARN-3318.39.patch, YARN-3318.45.patch, 
> YARN-3318.47.patch, YARN-3318.48.patch, YARN-3318.52.patch, 
> YARN-3318.53.patch, YARN-3318.56.patch, YARN-3318.57.patch, 
> YARN-3318.58.patch, YARN-3318.59.patch, YARN-3318.60.patch, YARN-3318.61.patch
>
>
> Create the initial framework required for using OrderingPolicies and an 
> initial FifoOrderingPolicy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3448) Add Rolling Time To Lives Level DB Plugin Capabilities

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496528#comment-14496528
 ] 

Hadoop QA commented on YARN-3448:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725620/YARN-3448.8.patch
  against trunk revision fddd552.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 10 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7347//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7347//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-applicationhistoryservice.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7347//console

This message is automatically generated.

> Add Rolling Time To Lives Level DB Plugin Capabilities
> --
>
> Key: YARN-3448
> URL: https://issues.apache.org/jira/browse/YARN-3448
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: YARN-3448.1.patch, YARN-3448.2.patch, YARN-3448.3.patch, 
> YARN-3448.4.patch, YARN-3448.5.patch, YARN-3448.7.patch, YARN-3448.8.patch
>
>
> For large applications, the majority of the time in LeveldbTimelineStore is 
> spent deleting old entities record at a time. An exclusive write lock is held 
> during the entire deletion phase which in practice can be hours. If we are to 
> relax some of the consistency constraints, other performance enhancing 
> techniques can be employed to maximize the throughput and minimize locking 
> time.
> Split the 5 sections of the leveldb database (domain, owner, start time, 
> entity, index) into 5 separate databases. This allows each database to 
> maximize the read cache effectiveness based on the unique usage patterns of 
> each database. With 5 separate databases each lookup is much faster. This can 
> also help with I/O to have the entity and index databases on separate disks.
> Rolling DBs for entity and index DBs. 99.9% of the data are in these two 
> sections 4:1 ration (index to entity) at least for tez. We replace DB record 
> removal with file system removal if we create a rolling set of databases that 
> age out and can be efficiently removed. To do this we must place a constraint 
> to always place an entity's events into it's correct rolling db instance 
> based on start time. This allows us to stitching the data back together while 
> reading and artificial paging.
> Relax the synchronous writes constraints. If we are willing to accept losing 
> some records that we not flushed in the operating system during a crash, we 
> can use async writes that can be much faster.
> Prefer Sequential writes. sequential writes can be several times faster than 
> random writes. Spend some small effort arranging the writes in such a way 
> that will trend towards sequential write performance over random write 
> performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Varun Saxena (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Saxena updated YARN-3051:
---
Attachment: YARN-3051.wip.patch

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496534#comment-14496534
 ] 

Varun Saxena commented on YARN-3051:


Updated a WIP patch. Will update javadoc after everyone is on same page on the 
approach and API. 
Working on unit tests.

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496538#comment-14496538
 ] 

Junping Du commented on YARN-3411:
--

Thanks [~vrushalic] for delivering the proposal and poc patch which is an 
excellent job!
Some quick comments from walk through proposal:
bq. Entity Table - primary key components-putting the UserID first helps to 
distribute writes across the regions in the hbase cluster.  Pros:​ avoids 
single region hotspotting. Cons:​ connections would be open to several region 
servers during writes from per node ATS.
Looks like we are try to get rid of region server hotspotting issues. I agree 
that this design could helps. However, this is still possible that specific 
user could submit much more applications than anyone else. In that case, the 
region hotspot issue will still appear. Isn't it? I think the more general way 
to solve this problem is making keys get salted with a prefix. Thoughts?

bq. Entity Table - column families​-config needs to be stored as key value, not 
as a blob to enable efficient key based querying based on config param name. 
storing it in a separate column family helps to avoid scanning over config  
while reading metrics and vice versa
+1. This leverage strength of columnar database. We should get rid of storing 
any default value for key. However, this sounds challengable if TimelineClient 
only has a configuration object.

bq. Entity Table - metrics are written to with an hbase cell timestamp set to 
top of the minute or top of the 5 minute interval or whatever is decided. This 
helps in timeseries storage and retrieval in case of querying at the entity 
level.
Can we also let TimelineCollector do some aggregation of metrics in a similar 
time interval rather than sending to HBase/Pheonix for every metrics when it 
received? This may help to lease some pressure to backend.

bq. Flow by application id table
I am still think we should figure out some way to store application attempts 
info. The typical usecase here is: for some reason (like: bug or hardware 
capability reason), some flow/application's AM could always get failed more 
times than other flows/applications. Keeping this info can help us to track 
these issues. Isn't it?

bq. flow summary daily table (aggregation table managed by Phoenix) - could be 
triggered via co­processor with each put in flow table or a cron run once per 
day to aggregate for yesterday (with catchup functionality in case of backlog 
etc)
Do each put in flow table sounds a little expensive especially when putting 
activity is very frequently. May be we should do some batch mode here? In 
addition, I think we can leverage per node TimelineCollector to do some first 
level aggregation which can help to relieve workload in backend.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2605) [RM HA] Rest api endpoints doing redirect incorrectly

2015-04-15 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong updated YARN-2605:

Issue Type: Sub-task  (was: Bug)
Parent: YARN-149

> [RM HA] Rest api endpoints doing redirect incorrectly
> -
>
> Key: YARN-2605
> URL: https://issues.apache.org/jira/browse/YARN-2605
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: bc Wong
>Assignee: Anubhav Dhoot
>  Labels: newbie
>
> The standby RM's webui tries to do a redirect via meta-refresh. That is fine 
> for pages designed to be viewed by web browsers. But the API endpoints 
> shouldn't do that. Most programmatic HTTP clients do not do meta-refresh. I'd 
> suggest HTTP 303, or return a well-defined error message (json or xml) 
> stating that the standby status and a link to the active RM.
> The standby RM is returning this today:
> {noformat}
> $ curl -i http://bcsec-1.ent.cloudera.com:8088/ws/v1/cluster/metrics
> HTTP/1.1 200 OK
> Cache-Control: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Expires: Thu, 25 Sep 2014 18:34:53 GMT
> Date: Thu, 25 Sep 2014 18:34:53 GMT
> Pragma: no-cache
> Content-Type: text/plain; charset=UTF-8
> Refresh: 3; url=http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> Content-Length: 117
> Server: Jetty(6.1.26)
> This is standby RM. Redirecting to the current active RM: 
> http://bcsec-2.ent.cloudera.com:8088/ws/v1/cluster/metrics
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2696:
-
Attachment: YARN-2696.2.patch

Attached ver.2 patch fixed findbugs warning and test failures 
(TestRMDelegationTokens is not related).

I've thought about Jian's comment:
bq. We can merge PartitionedQueueComparator and nonPartitionedQueueComparator 
into a single QueueComparator.
After think about this, I think we cannot, NonPartitionedQueueComparator is 
stateless, and PartitionedQueueComparator is stateful, someone can modify 
"partitionToLookAt" for Partitioned.., but we should keep 
NonPartitionedQueueComparator only and always sort by default partition.


> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3490) Add an application decorator to ClientRMService

2015-04-15 Thread Jian Fang (JIRA)
Jian Fang created YARN-3490:
---

 Summary: Add an application decorator to ClientRMService
 Key: YARN-3490
 URL: https://issues.apache.org/jira/browse/YARN-3490
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: resourcemanager
Reporter: Jian Fang


Per the discussion on MAPREDUCE-6304, hadoop cloud service provider wants to 
hook in some logic to control the allocation of an application on the resource 
manager side because it is sometimes impractical to control the client side of 
a hadoop cluster in cloud. Hadoop service provider and hadoop users usually 
have different privileges, control, and access on a hadoop cluster in cloud. 

One good example is that application masters should not be allocated to spot 
instances on Amazon EC2. To achieve that, an application decorator could be 
provided to orchestrate the ApplicationSubmissionContext by specifying the AM 
label expression, for example. 

Hadoop could provide a dummy decorator that does nothing by default, but it 
should allow users to replace this decorator with their own decorators to meet 
their specific needs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496617#comment-14496617
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves],
I think your concerns may not be a problem, ResourceLimits will be replaced 
(instead of updated) when node heartbeat. And ResourceLimits object itself is 
to decouple Parent and Child (e.g. ParentQueue to Children, LeafQueue to apps), 
Child doesn't need to understand how Parent compute limits, it only need to 
respect limits. For example, app doesn't need to understand how queue computing 
queue capacity/user-limit/continous-reservation-looking, it only need to know 
what's the "limit" considering all factors, so it can make decision to 
allocate/release-before-allocate/cannot-continue.

The usage of ResourceLimits in my mind for user-limit case is:
- ParentQueue compute/set limits
- LeafQueue store limits (why store see 1.)
- LeafQueue recompute/set user-limit when trying to do allocate for each 
app/priority
- LeafQueue check user-limit as well as limits when trying to allocate/reserve 
container
- The user-limit saved in ResourceLimits is only used in normal 
allocation/reservation path, if it's a reserved allocation, we will reset 
user-limit to un-limited.

1. Why store limits in LeafQueue instead of passing down?
This is required by headroom computing, app's headroom is affected by queue's 
parent as well as sibling changes, we cannot update all app's headroom when 
that changes, but we need recompute headroom when app do heartbeat, so we have 
to store latest ResourceLimits in LeafQueue. See YARN-2008 for more information.

I'm not sure if above can make you understand better about my suggestion. 
Please let me know your thoughts.

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496644#comment-14496644
 ] 

Wangda Tan commented on YARN-3434:
--

bq. All you would be using it for is passing it down to assignContainer and 
then it would be out of date. If someone else started looking at that value 
assuming it was up to date then it would be wrong (unless of course we started 
updating it as stated above). But it would only be for a single user, not all 
users unless again we changed to calculate for every user whenever something 
changed. That seems a bit excessive.
To clarify, ResourceLimits is the bridge between parent and child, parent will 
tell child "hey, this is the limit you can use", LeafQueue will do the same 
thing to app, ParentQueue doesn't compute/pass-down user-limit to LeafQueue at 
all, LeafQueue will do that and make sure it get updated for every allocation.

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2306) leak of reservation metrics (fair scheduler)

2015-04-15 Thread Jian Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496648#comment-14496648
 ] 

Jian Fang commented on YARN-2306:
-

Could someone please tell me which JIRA has fixed this bug in trunk? I am 
working on hadoop 2.6.0 branch and need to see if I need to fix this issue or 
not. Thanks in advance.

> leak of reservation metrics (fair scheduler)
> 
>
> Key: YARN-2306
> URL: https://issues.apache.org/jira/browse/YARN-2306
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Hong Zhiguo
>Assignee: Hong Zhiguo
>Priority: Minor
> Attachments: YARN-2306-2.patch, YARN-2306.patch
>
>
> This only applies to fair scheduler. Capacity scheduler is OK.
> When appAttempt or node is removed, the metrics for 
> reservation(reservedContainers, reservedMB, reservedVCores) is not reduced 
> back.
> These are important metrics for administrator. The wrong metrics confuses may 
> confuse them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)
zhihai xu created YARN-3491:
---

 Summary: Improve the public resource localization to do both 
FSDownload submission to the thread pool and completed localization handling in 
one thread (PublicLocalizer).
 Key: YARN-3491
 URL: https://issues.apache.org/jira/browse/YARN-3491
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Affects Versions: 2.7.0
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical


Improve the public resource localization to do both FSDownload submission to 
the thread pool and completed localization handling in one thread 
(PublicLocalizer).
Currently FSDownload submission to the thread pool is done in 
PublicLocalizer#addResource which is running in Dispatcher thread and completed 
localization handling is done in PublicLocalizer#run which is running in 
PublicLocalizer thread.
Because FSDownload submission to the thread pool at the following code is time 
consuming, the thread pool can't be fully utilized. Instead of doing public 
resource localization in parallel(multithreading), public resource localization 
is serialized most of the time.
{code}
synchronized (pending) {
  pending.put(queue.submit(new FSDownload(lfs, null, conf,
  publicDirDestPath, resource, 
request.getContext().getStatCache())),
  request);
}
{code}

Also there are two more benefits with this change:
1. The Dispatcher thread won't be blocked by above FSDownload submission. 
Dispatcher thread handles most of time critical events at Node manager.
2. don't need synchronization on HashMap (pending).
Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496663#comment-14496663
 ] 

Jason Lowe commented on YARN-3491:
--

Could you elaborate a bit on why the submit is time consuming?  Unless I'm 
mistaken, the FSDownload constructor is very cheap and queueing should be 
simply tacking an entry on a queue.

> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> -
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in 
> PublicLocalizer#addResource which is running in Dispatcher thread and 
> completed localization handling is done in PublicLocalizer#run which is 
> running in PublicLocalizer thread.
> Because FSDownload submission to the thread pool at the following code is 
> time consuming, the thread pool can't be fully utilized. Instead of doing 
> public resource localization in parallel(multithreading), public resource 
> localization is serialized most of the time.
> {code}
> synchronized (pending) {
>   pending.put(queue.submit(new FSDownload(lfs, null, conf,
>   publicDirDestPath, resource, 
> request.getContext().getStatCache())),
>   request);
> }
> {code}
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
> Dispatcher thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) ReST support for getLabelsToNodes

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496693#comment-14496693
 ] 

Tsuyoshi Ozawa commented on YARN-3326:
--

+1, committing this shortly. Hey [~Naganarasimha], could you open new JIRA to 
update documentation for this feature?

> ReST support for getLabelsToNodes 
> --
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496702#comment-14496702
 ] 

zhihai xu commented on YARN-3491:
-

I saw the serialization for public resource localization in the following logs:
The following log shows two private localization requests and many public 
localization requests from container_e30_1426628374875_110892_01_000475
{code}
2015-04-07 22:49:56,750 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: 
Container container_e30_1426628374875_110892_01_000475 transitioned from NEW to 
LOCALIZING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.xml 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/user/databot/.staging/job_1426628374875_110892/job.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar
 transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource hdfs://nameservice1/tmp/temp182237/tmp-1521315530/ace-geo.jar 
transitioned from INIT to DOWNLOADING
2015-04-07 22:49:56,751 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp1347512155/cortex-server.jar 
transitioned from INIT to DOWNLOADING
{code}

The following log shows how the public resource localizations are processed.
{code}
2015-04-07 22:49:56,758 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Created localizer for container_e30_1426628374875_110892_01_000475

2015-04-07 22:49:56,758 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar, 
1428446867531, FILE, null }

2015-04-07 22:49:56,882 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar, 
1428446864128, FILE, null }

2015-04-07 22:49:56,902 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp-1316042064/reflections.jar(->/data2/yarn/nm/filecache/4877652/reflections.jar)
 transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,127 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar,
 1428446858408, FILE, null }

2015-04-07 22:49:57,145 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp-327542609/service-media-sdk.jar(->/data11/yarn/nm/filecache/4877653/service-media-sdk.jar)
 transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,251 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp-1521315530/ace-geo.jar, 
1428446862857, FILE, null }

2015-04-07 22:49:57,270 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
 Resource 
hdfs://nameservice1/tmp/temp182237/tmp1631960573/service-local-search-sdk.jar(->/data1/yarn/nm/filecache/4877654/service-local-search-sdk.jar)
 transitioned from DOWNLOADING to LOCALIZED

2015-04-07 22:49:57,383 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Downloading public rsrc:{ 
hdfs://nameservice1/tmp/temp182237/tmp1347512155/cortex-server.jar, 
1428446857069, FILE, null }
{code}

Based on the log, You can see the thread pools are not fully used, only one 
thread is used. The default thread

[jira] [Updated] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-04-15 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-3005:

Assignee: Kengo Seki

> [JDK7] Use switch statement for String instead of if-else statement in 
> RegistrySecurity.java
> 
>
> Key: YARN-3005
> URL: https://issues.apache.org/jira/browse/YARN-3005
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Akira AJISAKA
>Assignee: Kengo Seki
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: YARN-3005.001.patch, YARN-3005.002.patch
>
>
> Since we have moved to JDK7, we can refactor the below if-else statement for 
> String.
> {code}
> // TODO JDK7 SWITCH
> if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
>   access = AccessPolicy.sasl;
> } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
>   access = AccessPolicy.digest;
> } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
>   access = AccessPolicy.anon;
> } else {
>   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
>   + "\"" + auth + "\"");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated YARN-3326:
-
Summary: Support RESTful API for getLabelsToNodes   (was: ReST support for 
getLabelsToNodes )

> Support RESTful API for getLabelsToNodes 
> -
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496712#comment-14496712
 ] 

Tsuyoshi Ozawa commented on YARN-3326:
--

Committed this to trunk and branch-2. Thanks [~Naganarasimha] for your 
contribution and thanks [~vvasudev] for your review!

> Support RESTful API for getLabelsToNodes 
> -
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3005) [JDK7] Use switch statement for String instead of if-else statement in RegistrySecurity.java

2015-04-15 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496708#comment-14496708
 ] 

Akira AJISAKA commented on YARN-3005:
-

Assigned [~sekikn]. Thanks.

> [JDK7] Use switch statement for String instead of if-else statement in 
> RegistrySecurity.java
> 
>
> Key: YARN-3005
> URL: https://issues.apache.org/jira/browse/YARN-3005
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Akira AJISAKA
>Assignee: Kengo Seki
>Priority: Trivial
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: YARN-3005.001.patch, YARN-3005.002.patch
>
>
> Since we have moved to JDK7, we can refactor the below if-else statement for 
> String.
> {code}
> // TODO JDK7 SWITCH
> if (REGISTRY_CLIENT_AUTH_KERBEROS.equals(auth)) {
>   access = AccessPolicy.sasl;
> } else if (REGISTRY_CLIENT_AUTH_DIGEST.equals(auth)) {
>   access = AccessPolicy.digest;
> } else if (REGISTRY_CLIENT_AUTH_ANONYMOUS.equals(auth)) {
>   access = AccessPolicy.anon;
> } else {
>   throw new ServiceStateException(E_UNKNOWN_AUTHENTICATION_MECHANISM
>   + "\"" + auth + "\"");
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3394) WebApplication proxy documentation is incomplete

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496725#comment-14496725
 ] 

Tsuyoshi Ozawa commented on YARN-3394:
--

Thanks Naganarasimha for your contribution and thanks Jian for your commit!

> WebApplication  proxy documentation is incomplete
> -
>
> Key: YARN-3394
> URL: https://issues.apache.org/jira/browse/YARN-3394
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Bibin A Chundatt
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: WebApplicationProxy.html, YARN-3394.20150324-1.patch
>
>
> Webproxy documentation is incomplete
> hadoop-yarn/hadoop-yarn-site/WebApplicationProxy.html
> 1.Configuration of service start/stop as separate server
> 2.Steps to start as daemon service
> 3.Secure mode for Web proxy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496732#comment-14496732
 ] 

Hadoop QA commented on YARN-2696:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725637/YARN-2696.2.patch
  against trunk revision 9e8309a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7348//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7348//console

This message is automatically generated.

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496731#comment-14496731
 ] 

Hudson commented on YARN-3326:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #7590 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7590/])
YARN-3326. Support RESTful API for getLabelsToNodes. Contributed by 
Naganarasimha G R. (ozawa: rev e48cedc663b8a26fd62140c8e2907f9b4edd9785)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/LabelsToNodesInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesNodeLabels.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/NodeIDsInfo.java


> Support RESTful API for getLabelsToNodes 
> -
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3326) Support RESTful API for getLabelsToNodes

2015-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496733#comment-14496733
 ] 

Naganarasimha G R commented on YARN-3326:
-

Thanks [~ozawa], thanks for the review, Will check the scope of yarn-2801 and 
if it doesnt cover this feature then will raise a new jira. 

> Support RESTful API for getLabelsToNodes 
> -
>
> Key: YARN-3326
> URL: https://issues.apache.org/jira/browse/YARN-3326
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Affects Versions: 2.6.0
>Reporter: Naganarasimha G R
>Assignee: Naganarasimha G R
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3326.20150310-1.patch, YARN-3326.20150407-1.patch, 
> YARN-3326.20150408-1.patch
>
>
> REST to support to retrieve LabelsToNodes Mapping



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496735#comment-14496735
 ] 

Thomas Graves commented on YARN-3434:
-

I am not saying child needs to know how parent calculate resource limit.  I am 
saying user limit and whether it needs to unreserve to make another reservation 
has nothing to do with the parent queue (ie it doesn't apply to parent queue).  
Remember I'm not needing to store user limit, I'm needing to store the fact of 
whether it needs to unreserve and if it does how much does it need to unreserve.

When a node heartbeats it goes through the regular assignments and updates the 
leafQueue clusterResources based on what the parent passes in. When a node is 
removed or added then it updates the resource limits (none of these apply to 
calculation of whether it needs to unreserve or not). 

Basically it comes down to is this information useful outside of the small 
window between when it calculates it and when its needed in assignContainer() 
and my thought is no.  And you said it yourself in last bullet above.  Although 
we have been referring to the userLImit and perhaps that is the problem.  I 
don't need to store the userLimit, I need to store whether it needs to 
unreserve and if so how much.  Therefore it fits better as a local transient 
variable rather then a globally stored one.  If you store just the userLImit 
then you need to recalculate stuff which I'm trying to avoid.

I understand why we are storing the current information in ResourceLimits 
because it has to do with headroom and parent limits and is recalculated at 
various points, but the current implementation in canAssignToUser doesn't use 
headroom at all and whether we need to unreserve or not on the last call to 
assignContainers doesn't affect the headroom calculation.

Again basically all we would be doing is placing an extra global variable(s) in 
the ResourceLimits class just to pass it on down a couple of functions. That to 
me is a parameter.   Now if we had multiple things needing this or updating it 
then to me fits better in the ResourceLimits.  



> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3462) Patches applied for YARN-2424 are inconsistent between trunk and branch-2

2015-04-15 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496757#comment-14496757
 ] 

Naganarasimha G R commented on YARN-3462:
-

Thanks for reviewing and Commiting , [~qwertymaniac] & [~sidharta-s]

> Patches applied for YARN-2424 are inconsistent between trunk and branch-2
> -
>
> Key: YARN-3462
> URL: https://issues.apache.org/jira/browse/YARN-3462
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Sidharta Seethana
>Assignee: Naganarasimha G R
> Fix For: 2.7.1
>
> Attachments: YARN-3462.20150508-1.patch
>
>
> It looks like the changes for YARN-2424 are not the same for trunk (commit 
> 7e75226e68715c3eca9d346c8eaf2f265aa70d23) and branch-2 (commit 
> 5d965f2f3cf97a87603720948aacd4f7877d73c4) . Branch-2 has a missing warning 
> and documentation is a bit different as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created YARN-3492:
--

 Summary: AM fails to come up because RM and NM can't connect to 
each other
 Key: YARN-3492
 URL: https://issues.apache.org/jira/browse/YARN-3492
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: pseudo-distributed cluster on a mac
Reporter: Karthik Kambatla
Priority: Blocker


Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
container gets allocated, but doesn't get launched. The NM can't talk to the 
RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3492:
---
Attachment: yarn-kasha-resourcemanager-kasha-mbp.local.log
yarn-kasha-nodemanager-kasha-mbp.local.log

> AM fails to come up because RM and NM can't connect to each other
> -
>
> Key: YARN-3492
> URL: https://issues.apache.org/jira/browse/YARN-3492
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: pseudo-distributed cluster on a mac
>Reporter: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-kasha-nodemanager-kasha-mbp.local.log, 
> yarn-kasha-resourcemanager-kasha-mbp.local.log
>
>
> Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
> container gets allocated, but doesn't get launched. The NM can't talk to the 
> RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496881#comment-14496881
 ] 

Jian He commented on YARN-2696:
---

few minor comments 
- add a comment why no_label max resource is treated separately. 
{code}
if (nodePartition == null
|| nodePartition.equals(RMNodeLabelsManager.NO_LABEL))
{code}
- getChildrenAllocationIterator -> sortAndGetChildrenAllocationIterator

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496888#comment-14496888
 ] 

Jian He commented on YARN-3354:
---

+1 

> Container should contains node-labels asked by original ResourceRequests
> 
>
> Key: YARN-3354
> URL: https://issues.apache.org/jira/browse/YARN-3354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3354.1.patch, YARN-3354.2.patch
>
>
> We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
> resource requests can be allocated on labeled nodes which has idle resources.
> To make preemption work, we need know an allocated container's original node 
> label: when labeled resource requests comes back, we need kill non-labeled 
> containers running on labeled nodes.
> This requires add node-labels in Container, and also, NM need store this 
> information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496892#comment-14496892
 ] 

Jian He commented on YARN-2696:
---

- Does this overlap with below {{Resources.equals(queueGuranteedResource, 
Resources.none()) ? 0}}  check ?
{code}
  // make queueGuranteed >= minimum_allocation to avoid divided by 0.
  queueGuranteedResource =
  Resources.max(rc, totalPartitionResource, queueGuranteedResource,
  minimumAllocation);
{code}

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496910#comment-14496910
 ] 

Wangda Tan commented on YARN-3434:
--

[~tgraves],
Make sense to me, especially for the {{local transient variable rather then a 
globally stored one}}. So I think after the change, flows to use/update 
ResourceLimit will be:
{code}
In LeafQueue:

Both:
  updateClusterResource |
|--> resource-limit 
  assignContainers  | update&store   (only for compute headroom)

Only:
  assignContainers
|
V
 check queue limit
|
V
 check user limit
|
V
 set how-much-should-unreserve to ResourceLimits and pass down
 {code}

 Is that what you also think about?

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496918#comment-14496918
 ] 

Tsuyoshi Ozawa commented on YARN-3492:
--

[~kasha], could you attach yarn-site.xml and mapred-site.xml for investigation?

> AM fails to come up because RM and NM can't connect to each other
> -
>
> Key: YARN-3492
> URL: https://issues.apache.org/jira/browse/YARN-3492
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: pseudo-distributed cluster on a mac
>Reporter: Karthik Kambatla
>Priority: Blocker
> Attachments: yarn-kasha-nodemanager-kasha-mbp.local.log, 
> yarn-kasha-resourcemanager-kasha-mbp.local.log
>
>
> Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
> container gets allocated, but doesn't get launched. The NM can't talk to the 
> RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496920#comment-14496920
 ] 

Jian He commented on YARN-3404:
---

+1

> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
> YARN-3404.4.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3492) AM fails to come up because RM and NM can't connect to each other

2015-04-15 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-3492:
---
Attachment: yarn-site.xml
mapred-site.xml

> AM fails to come up because RM and NM can't connect to each other
> -
>
> Key: YARN-3492
> URL: https://issues.apache.org/jira/browse/YARN-3492
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.7.0
> Environment: pseudo-distributed cluster on a mac
>Reporter: Karthik Kambatla
>Priority: Blocker
> Attachments: mapred-site.xml, 
> yarn-kasha-nodemanager-kasha-mbp.local.log, 
> yarn-kasha-resourcemanager-kasha-mbp.local.log, yarn-site.xml
>
>
> Stood up a pseudo-distributed cluster with 2.7.0 RC0. Submitted a pi job. The 
> container gets allocated, but doesn't get launched. The NM can't talk to the 
> RM. Logs to follow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2696:
-
Attachment: YARN-2696.3.patch

Addressed all comments from [~jianhe] and fixed test failure in 
TestFifoScheduler, uploaded ver.3 patch.

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496981#comment-14496981
 ] 

Wangda Tan commented on YARN-3354:
--

Test failure is not related to the patch.

> Container should contains node-labels asked by original ResourceRequests
> 
>
> Key: YARN-3354
> URL: https://issues.apache.org/jira/browse/YARN-3354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-3354.1.patch, YARN-3354.2.patch
>
>
> We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
> resource requests can be allocated on labeled nodes which has idle resources.
> To make preemption work, we need know an allocated container's original node 
> label: when labeled resource requests comes back, we need kill non-labeled 
> containers running on labeled nodes.
> This requires add node-labels in Container, and also, NM need store this 
> information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3404) View the queue name to YARN Application page

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496990#comment-14496990
 ] 

Hudson commented on YARN-3404:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7594 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7594/])
YARN-3404. Display queue name on application page. Contributed by Ryu Kobayashi 
(jianhe: rev b2e6cf607f1712d103520ca6b3ff21ecc07cd265)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/webapp/AppBlock.java


> View the queue name to YARN Application page
> 
>
> Key: YARN-3404
> URL: https://issues.apache.org/jira/browse/YARN-3404
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Assignee: Ryu Kobayashi
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-3404.1.patch, YARN-3404.2.patch, YARN-3404.3.patch, 
> YARN-3404.4.patch, screenshot.png
>
>
> It want to display the name of the queue that is used to YARN Application 
> page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3354) Container should contains node-labels asked by original ResourceRequests

2015-04-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497013#comment-14497013
 ] 

Hudson commented on YARN-3354:
--

FAILURE: Integrated in Hadoop-trunk-Commit #7595 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7595/])
YARN-3354. Add node label expression in ContainerTokenIdentifier to support RM 
recovery. Contributed by Wangda Tan (jianhe: rev 
1b89a3e173f8e905074ed6714a7be5c003c0e2c4)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/security/RMContainerTokenSecretManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/TestContainerManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainer.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/NMContainerStatus.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/java/org/apache/hadoop/yarn/server/api/protocolrecords/impl/pb/NMContainerStatusPBImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/src/main/proto/yarn_server_common_service_protos.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestWorkPreservingRMRestartForNodeLabel.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/security/ContainerTokenIdentifier.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/proto/server/yarn_security_token.proto
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java


> Container should contains node-labels asked by original ResourceRequests
> 
>
> Key: YARN-3354
> URL: https://issues.apache.org/jira/browse/YARN-3354
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: api, capacityscheduler, nodemanager, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 2.8.0
>
> Attachments: YARN-3354.1.patch, YARN-3354.2.patch
>
>
> We proposed non-exclusive node labels in YARN-3214, makes non-labeled 
> resource requests can be allocated on labeled nodes which has idle resources.
> To make preemption work, we need know an allocated container's original node 
> label: when labeled resource requests comes back, we need kill non-labeled 
> containers running on labeled nodes.
> This requires add node-labels in Container, and also, NM need store this 
> information and send back to RM when RM restart to recover original container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497055#comment-14497055
 ] 

Thomas Graves commented on YARN-3434:
-

I agree with Both section.  I'm not sure I completely follow the Only section. 
Are you suggesting we change the patch to modify ResourceLimits and pass down 
rather then using the LimitsInfo class?  If so that won't work, at least not 
without adding the shouldContinue flag to it.  Unless you mean keep LimitsInfo 
class for use locally in assignContainers and then pass ResourceLimits down to 
assignContainer with the value of amountNeededUnreserve as the limit.  That 
wouldn't really change much exception the object we pass down through the 
functions. 

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497065#comment-14497065
 ] 

Wangda Tan commented on YARN-3434:
--

bq. Are you suggesting we change the patch to modify ResourceLimits and pass 
down rather then using the LimitsInfo class? 
Yes, that's my suggested.

bq. at least not without adding the shouldContinue flag to it
Kind of, what I'm thinking is we can add "amountNeededUnreserve" to 
ResourceLimits. canAssignToThisQueue/User will return boolean means 
shouldContinue, and set "amountNeededUnreserve" (instead of limit, we don't 
need to change limit). That very similar to your original logic and we don't 
need the extra LimitsInfo. After we get the updated the ResourceLimit and pass 
down, problem should be resolved.

Did I miss anything?

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497076#comment-14497076
 ] 

Thomas Graves commented on YARN-3434:
-

so you are saying add amountNeededUnreserve to ResourceLimits and then set the 
global currentResourceLimits.amountNeededUnreserve inside of canAssignToUser?  
This is what I was not in favor of above and there would be no need to pass it 
down as parameter.

Or were you saying create a ResourceLimit and pass it as parameter to 
canAssignToUser and canAssignToThisQueue and modify that instance. That 
instance would then be passed down though to assignContainer()?

I don't see how else you set the ResourceLimit.

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497085#comment-14497085
 ] 

zhihai xu commented on YARN-3491:
-

Hi [~jlowe], thanks for the comment. Queueing is faster, but It take longer 
time to add FSDownload to the worker thread.
If all threads in the thread pool are used, it will be very fast to add an 
entry to the queue LinkedBlockingQueue#offer.
Based on the following code in ThreadPoolExecutor#execute, corePoolSize is 
thread pool size which is 4 in this case.
workQueue.offer(command) is fast but addWorker is slow. It only queues the task 
when all threads in the thread pool are running.
{code}
   public void execute(Runnable command) {
if (command == null)
throw new NullPointerException();
/*
 * Proceed in 3 steps:
 *
 * 1. If fewer than corePoolSize threads are running, try to
 * start a new thread with the given command as its first
 * task.  The call to addWorker atomically checks runState and
 * workerCount, and so prevents false alarms that would add
 * threads when it shouldn't, by returning false.
 *
 * 2. If a task can be successfully queued, then we still need
 * to double-check whether we should have added a thread
 * (because existing ones died since last checking) or that
 * the pool shut down since entry into this method. So we
 * recheck state and if necessary roll back the enqueuing if
 * stopped, or start a new thread if there are none.
 *
 * 3. If we cannot queue task, then we try to add a new
 * thread.  If it fails, we know we are shut down or saturated
 * and so reject the task.
 */
int c = ctl.get();
if (workerCountOf(c) < corePoolSize) {
if (addWorker(command, true))
return;
c = ctl.get();
}
if (isRunning(c) && workQueue.offer(command)) {
int recheck = ctl.get();
if (! isRunning(recheck) && remove(command))
reject(command);
else if (workerCountOf(recheck) == 0)
addWorker(null, false);
}
else if (!addWorker(command, false))
reject(command);
}
{code}

The issue is:
If the time to run one FSDownload(resource localization) is close to the time 
to run the submit(add FSDownload to the worker thread).
The oscillation will happen and there will be only one worker thread running. 
Then Dispatcher thread will be blocked for longer time.
The above logs can prove this situation. LocalizerRunner#addResource used by 
private localizer takes less than one millisecond to process one 
REQUEST_RESOURCE_LOCALIZATION event but PublicLocalizer#addResource used by 
public localizer takes 124 millisecond to process one 
REQUEST_RESOURCE_LOCALIZATION  event.


> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> -
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in 
> PublicLocalizer#addResource which is running in Dispatcher thread and 
> completed localization handling is done in PublicLocalizer#run which is 
> running in PublicLocalizer thread.
> Because FSDownload submission to the thread pool at the following code is 
> time consuming, the thread pool can't be fully utilized. Instead of doing 
> public resource localization in parallel(multithreading), public resource 
> localization is serialized most of the time.
> {code}
> synchronized (pending) {
>   pending.put(queue.submit(new FSDownload(lfs, null, conf,
>   publicDirDestPath, resource, 
> request.getContext().getStatCache())),
>   request);
> }
> {code}
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
> Dispatcher thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassi

[jira] [Commented] (YARN-3434) Interaction between reservations and userlimit can result in significant ULF violation

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497087#comment-14497087
 ] 

Wangda Tan commented on YARN-3434:
--

bq. Or were you saying create a ResourceLimit and pass it as parameter to 
canAssignToUser and canAssignToThisQueue and modify that instance. That 
instance would then be passed down though to assignContainer()?
I prefer the above one which is according to your previously comment "local 
transient variable rather than a globally stored one". Is this also what you 
preferred?

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --
>
> Key: YARN-3434
> URL: https://issues.apache.org/jira/browse/YARN-3434
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 2.6.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: YARN-3434.patch
>
>
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Sumana Sathish (JIRA)
Sumana Sathish created YARN-3493:


 Summary: RM fails to come up with error "Failed to load/recover 
state" when  mem settings are changed
 Key: YARN-3493
 URL: https://issues.apache.org/jira/browse/YARN-3493
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.0
Reporter: Sumana Sathish
Priority: Critical
 Fix For: 2.7.0


RM fails to come up for the following case:
1. Change yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in background 
and wait for the job to reach running state
3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
before the above job completes
4. Restart RM
5. RM fails to come up with the below error
{code:title= RM error for Mem settings changed}
 - RM app submission failed in validating AM resource request for application 
application_1429094976272_0008
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested memory < 0, or requested memory > max configured, 
requestedMemory=3072, maxMemory=2048
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
(ResourceManager.java:serviceStart(579)) - Failed to load/recover state
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
resource request, requested memory < 0, or requested memory > max configured, 
requestedMemory=3072, maxMemory=2048
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
at 
org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio

[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.64.patch

rebased to current trunk

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Sumana Sathish (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumana Sathish updated YARN-3493:
-
Attachment: yarn-yarn-resourcemanager.log.zip

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.s

[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497122#comment-14497122
 ] 

Karthik Kambatla commented on YARN-3493:


[~jianhe] - YARN-2010 should have fixed this right? 

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 

[jira] [Assigned] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned YARN-3493:
-

Assignee: Jian He

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.

[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Fix Version/s: (was: 2.7.0)

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceM

[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497129#comment-14497129
 ] 

Jian He commented on YARN-3493:
---

[~kasha], I think this happened on a different code path.

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.

[jira] [Commented] (YARN-2696) Queue sorting in CapacityScheduler should consider node label

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497139#comment-14497139
 ] 

Hadoop QA commented on YARN-2696:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725687/YARN-2696.3.patch
  against trunk revision b2e6cf6.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7349//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7349//console

This message is automatically generated.

> Queue sorting in CapacityScheduler should consider node label
> -
>
> Key: YARN-2696
> URL: https://issues.apache.org/jira/browse/YARN-2696
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler, resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2696.1.patch, YARN-2696.2.patch, YARN-2696.3.patch
>
>
> In the past, when trying to allocate containers under a parent queue in 
> CapacityScheduler. The parent queue will choose child queues by the used 
> resource from smallest to largest. 
> Now we support node label in CapacityScheduler, we should also consider used 
> resource in child queues by node labels when allocating resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.65.patch

Suppress orderingpolicy from appearing in web service responses, is still on 
the web ui

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-04-15 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan reassigned YARN-2498:


Assignee: Wangda Tan  (was: Mayank Bansal)

> Respect labels in preemption policy of capacity scheduler
> -
>
> Key: YARN-2498
> URL: https://issues.apache.org/jira/browse/YARN-2498
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
> yarn-2498-implementation-notes.pdf
>
>
> There're 3 stages in ProportionalCapacityPreemptionPolicy,
> # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
> available resource, resource used/pending in each queue and guaranteed 
> capacity of each queue.
> # Mark to-be preempted containers: For each over-satisfied queue, it will 
> mark some containers will be preempted.
> # Notify scheduler about to-be preempted container.
> We need respect labels in the cluster for both #1 and #2:
> For #1, when there're some resource available in the cluster, we shouldn't 
> assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
> access such labels
> For #2, when we make decision about whether we need preempt a container, we 
> need make sure, resource this container is *possibly* usable by a queue which 
> is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2498) Respect labels in preemption policy of capacity scheduler

2015-04-15 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497261#comment-14497261
 ] 

Wangda Tan commented on YARN-2498:
--

Discussed with [~mayank_bansal], taking over and working on this, will post 
patch/implementation-notes soon.

> Respect labels in preemption policy of capacity scheduler
> -
>
> Key: YARN-2498
> URL: https://issues.apache.org/jira/browse/YARN-2498
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: resourcemanager
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
> yarn-2498-implementation-notes.pdf
>
>
> There're 3 stages in ProportionalCapacityPreemptionPolicy,
> # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
> available resource, resource used/pending in each queue and guaranteed 
> capacity of each queue.
> # Mark to-be preempted containers: For each over-satisfied queue, it will 
> mark some containers will be preempted.
> # Notify scheduler about to-be preempted container.
> We need respect labels in the cluster for both #1 and #2:
> For #1, when there're some resource available in the cluster, we shouldn't 
> assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
> access such labels
> For #2, when we make decision about whether we need preempt a container, we 
> need make sure, resource this container is *possibly* usable by a queue which 
> is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3491) Improve the public resource localization to do both FSDownload submission to the thread pool and completed localization handling in one thread (PublicLocalizer).

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497304#comment-14497304
 ] 

Sangjin Lee commented on YARN-3491:
---

I have the same question as [~jlowe]. The actual call

{code}
synchronized (pending) {
  pending.put(queue.submit(new FSDownload(lfs, null, conf,
  publicDirDestPath, resource, 
request.getContext().getStatCache())),
  request);
}
{code}
should be completely non-blocking and there is nothing that's expensive about 
it with the possible exception of the synchronization. Could you describe the 
root cause of the slowness you're seeing in some more detail?

> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> -
>
> Key: YARN-3491
> URL: https://issues.apache.org/jira/browse/YARN-3491
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.0
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>
> Improve the public resource localization to do both FSDownload submission to 
> the thread pool and completed localization handling in one thread 
> (PublicLocalizer).
> Currently FSDownload submission to the thread pool is done in 
> PublicLocalizer#addResource which is running in Dispatcher thread and 
> completed localization handling is done in PublicLocalizer#run which is 
> running in PublicLocalizer thread.
> Because FSDownload submission to the thread pool at the following code is 
> time consuming, the thread pool can't be fully utilized. Instead of doing 
> public resource localization in parallel(multithreading), public resource 
> localization is serialized most of the time.
> {code}
> synchronized (pending) {
>   pending.put(queue.submit(new FSDownload(lfs, null, conf,
>   publicDirDestPath, resource, 
> request.getContext().getStatCache())),
>   request);
> }
> {code}
> Also there are two more benefits with this change:
> 1. The Dispatcher thread won't be blocked by above FSDownload submission. 
> Dispatcher thread handles most of time critical events at Node manager.
> 2. don't need synchronization on HashMap (pending).
> Because pending will be only accessed in PublicLocalizer thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Attachment: YARN-3493.1.patch

Upload a patch to ignore this exception on recovery

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yar

[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497311#comment-14497311
 ] 

Hadoop QA commented on YARN-3463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725702/YARN-3463.64.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1147 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7350//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7350//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7350//console

This message is automatically generated.

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497315#comment-14497315
 ] 

Vrushali C commented on YARN-3051:
--

Hi [~varun_saxena]
As per the discussion in the call today, here is the query document about flow 
(and user and queue) based queries that I had mentioned (put up on jira 
YARN-3050) 
https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx

Also, some points that I think may be helpful:
- the reader API is not going to be limited to one or two api calls
- different queries will need different core read apis. For instance, all flow 
based queries may not need the application id or entity id info, but rather 
would need the flow id. for example, for a given user, return the flows that 
were run during this time frame. This query requires only cluster and cluster 
info, not entity nor application nor flowname is needed for the reader API to 
serve this query. This query cannot be boiled down to an entity level query.
- So the reader API should allow for entity level, application level, flow 
level, user level, queue level and cluster level queries.



> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497317#comment-14497317
 ] 

Jian He commented on YARN-3493:
---

cancel the patch, uploading a newer version.

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.a

[jira] [Commented] (YARN-3051) [Storage abstraction] Create backing storage read interface for ATS readers

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497320#comment-14497320
 ] 

Sangjin Lee commented on YARN-3051:
---

We chatted offline about the issue of what context is required for the reader 
API and the uniqueness requirement. I'm not sure if there is a complete 
agreement on this yet, but at least this is a proposal from us ([~vrushalic], 
[~jrottinghuis], and me).

- for reader calls that ask for sub-application entities, the application id 
must be specified
- uniqueness is similarly defined; (entity type, entity id) uniquely identifies 
an entity within the scope of a YARN application

We feel that this is the most natural way of supporting writes/reads. One 
scenario to consider is reducing impact on current users of ATS, as v.2 would 
require app id which v.1 did not require. For that, we would need to update the 
user library to have a compatibility layer (e.g. tez, etc.).

Thoughts?

> [Storage abstraction] Create backing storage read interface for ATS readers
> ---
>
> Key: YARN-3051
> URL: https://issues.apache.org/jira/browse/YARN-3051
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Varun Saxena
> Attachments: YARN-3051.wip.patch, YARN-3051_temp.patch
>
>
> Per design in YARN-2928, create backing storage read interface that can be 
> implemented by multiple backing storage implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3390) Reuse TimelineCollectorManager for RM

2015-04-15 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497341#comment-14497341
 ] 

Sangjin Lee commented on YARN-3390:
---

I took a pass at the patch, and it looks good for the most part. I would ask 
you to reconcile the TimelineCollectorManager changes with what I have over on 
YARN-3437. Again, I have a slight preference for the hook/template methods for 
the aforementioned reason, but it's not a strong preference one way or another.

However, I'm not sure why there is a change for RMContainerAllocator.java. It 
doesn't look like an intended change?

> Reuse TimelineCollectorManager for RM
> -
>
> Key: YARN-3390
> URL: https://issues.apache.org/jira/browse/YARN-3390
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
> Attachments: YARN-3390.1.patch
>
>
> RMTimelineCollector should have the context info of each app whose entity  
> has been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497354#comment-14497354
 ] 

Hadoop QA commented on YARN-3463:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12725714/YARN-3463.65.patch
  against trunk revision 1b89a3e.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1147 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/7351//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/7351//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7351//console

This message is automatically generated.

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3493) RM fails to come up with error "Failed to load/recover state" when mem settings are changed

2015-04-15 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-3493:
--
Attachment: YARN-3493.2.patch

uploaded a new patch

> RM fails to come up with error "Failed to load/recover state" when  mem 
> settings are changed
> 
>
> Key: YARN-3493
> URL: https://issues.apache.org/jira/browse/YARN-3493
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.0
>Reporter: Sumana Sathish
>Assignee: Jian He
>Priority: Critical
> Attachments: YARN-3493.1.patch, YARN-3493.2.patch, 
> yarn-yarn-resourcemanager.log.zip
>
>
> RM fails to come up for the following case:
> 1. Change yarn.nodemanager.resource.memory-mb and 
> yarn.scheduler.maximum-allocation-mb to 4000 in yarn-site.xml
> 2. Start a randomtextwriter job with mapreduce.map.memory.mb=4000 in 
> background and wait for the job to reach running state
> 3. Restore yarn-site.xml to have yarn.scheduler.maximum-allocation-mb to 2048 
> before the above job completes
> 4. Restart RM
> 5. RM fails to come up with the below error
> {code:title= RM error for Mem settings changed}
>  - RM app submission failed in validating AM resource request for application 
> application_1429094976272_0008
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:994)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1035)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1031)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1031)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1071)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1208)
> 2015-04-15 13:19:18,623 ERROR resourcemanager.ResourceManager 
> (ResourceManager.java:serviceStart(579)) - Failed to load/recover state
> org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
> resource request, requested memory < 0, or requested memory > max configured, 
> requestedMemory=3072, maxMemory=2048
> at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:204)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:385)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:328)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:317)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:422)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1187)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:574)
> at 
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
> at 
> org.apache.hadoop.yarn.server.

[jira] [Created] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-15 Thread Jian He (JIRA)
Jian He created YARN-3494:
-

 Summary: Expose AM resource limit and user limit in QueueMetrics 
 Key: YARN-3494
 URL: https://issues.apache.org/jira/browse/YARN-3494
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Jian He


Now we have the AM resource limit and user limit shown on the web UI, it would 
be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3463) Integrate OrderingPolicy Framework with CapacityScheduler

2015-04-15 Thread Craig Welch (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Craig Welch updated YARN-3463:
--
Attachment: YARN-3463.66.patch

Fix build warnings, the tests all pass on my box.

> Integrate OrderingPolicy Framework with CapacityScheduler
> -
>
> Key: YARN-3463
> URL: https://issues.apache.org/jira/browse/YARN-3463
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: capacityscheduler
>Reporter: Craig Welch
>Assignee: Craig Welch
> Attachments: YARN-3463.50.patch, YARN-3463.61.patch, 
> YARN-3463.64.patch, YARN-3463.65.patch, YARN-3463.66.patch
>
>
> Integrate the OrderingPolicy Framework with the CapacityScheduler



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (YARN-3494) Expose AM resource limit and user limit in QueueMetrics

2015-04-15 Thread Rohith (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith reassigned YARN-3494:


Assignee: Rohith

> Expose AM resource limit and user limit in QueueMetrics 
> 
>
> Key: YARN-3494
> URL: https://issues.apache.org/jira/browse/YARN-3494
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Jian He
>Assignee: Rohith
>
> Now we have the AM resource limit and user limit shown on the web UI, it 
> would be useful to expose them in the QueueMetrics as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >