[jira] [Commented] (YARN-3948) Display Application Priority in RM Web UI

2015-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641459#comment-14641459
 ] 

Hadoop QA commented on YARN-3948:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  20m 19s | Pre-patch trunk has 6 extant 
Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | javac |   7m 37s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   2m 21s | The applied patch generated  1 
new checkstyle issues (total was 15, now 16). |
| {color:red}-1{color} | whitespace |   0m  2s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   6m 30s | The patch appears to introduce 1 
new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   0m 21s | Tests passed in 
hadoop-yarn-api. |
| {color:green}+1{color} | yarn tests |   6m 56s | Tests passed in 
hadoop-yarn-client. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |   0m 25s | Tests passed in 
hadoop-yarn-server-common. |
| {color:red}-1{color} | yarn tests |  52m 20s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | | 111m 30s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-common |
| Failed unit tests | hadoop.yarn.server.resourcemanager.TestClientRMService |
|   | hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747165/0002-YARN-3948.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / adcf5dd |
| Pre-patch Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-common.html
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/diffcheckstylehadoop-yarn-api.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/whitespace.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-common.html
 |
| hadoop-yarn-api test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-api.txt
 |
| hadoop-yarn-client test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-client.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-server-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8668/console |


This message was automatically generated.

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 
 ApplicationPage.png, ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-07-25 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3656:
---
Attachment: YARN-3656-v1.5.patch

Fixing javadoc and one more checkstyle.

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Ishai Menache
Assignee: Jonathan Yaniv
  Labels: capacity-scheduler, resourcemanager
 Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, 
 YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, 
 YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf


 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641469#comment-14641469
 ] 

Hadoop QA commented on YARN-3656:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  16m 20s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 14 new or modified test files. |
| {color:green}+1{color} | javac |   7m 47s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 41s | The applied patch generated  1  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   0m 47s | The applied patch generated  
11 new checkstyle issues (total was 115, now 114). |
| {color:green}+1{color} | whitespace |   0m 12s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 21s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 36s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 27s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |  52m 25s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  91m  3s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747170/YARN-3656-v1.5.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / adcf5dd |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/8669/artifact/patchprocess/diffJavadocWarnings.txt
 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8669/artifact/patchprocess/diffcheckstylehadoop-yarn-server-resourcemanager.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8669/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8669/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8669/console |


This message was automatically generated.

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Ishai Menache
Assignee: Jonathan Yaniv
  Labels: capacity-scheduler, resourcemanager
 Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, 
 YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, 
 YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf


 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3965) Add starup timestamp for nodemanager

2015-07-25 Thread Hong Zhiguo (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641473#comment-14641473
 ] 

Hong Zhiguo commented on YARN-3965:
---

Hi, [~zxu], thanks for your comments.  Here comes my re-consideration.

1. The nmStartupTime could be non-statice field of NodeManager, but it make it 
harder to access it since the accesser must have a reference to the NodeManager 
instance.  For example, there's no such reference in current implementaion of 
NodeInfo constructor.  One option is to make nmStartupTime as a non-static 
filed of NMContext.  But I doubt is it worth to make simple thing complecated.  
BTW, the startup timestampt of ResourceManager is also static.

2. It's final so don't need warry about that. Private field with a Getter is 
also OK if you think it's better.

 Add starup timestamp for nodemanager
 

 Key: YARN-3965
 URL: https://issues.apache.org/jira/browse/YARN-3965
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: nodemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Minor
 Attachments: YARN-3965-2.patch, YARN-3965.patch


 We have startup timestamp for RM already, but don't for NM.
 Sometimes cluster operator modified configuration of all nodes and kicked off 
 command to restart all NMs.  He found out it's hard for him to check whether 
 all NMs are restarted.  Actually there's always some NMs didn't restart as he 
 expected, which leads to some error later due to inconsistent configuration.
 If we have startup timestamp for NM,  the operator could easily fetch it via 
 NM webservice and find out which NM didn't restart, and take mannaul action 
 for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-07-25 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641474#comment-14641474
 ] 

Brahma Reddy Battula commented on YARN-3528:


will look into to the testcase failures but all locally passed.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api

2015-07-25 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641478#comment-14641478
 ] 

Akira AJISAKA commented on YARN-3958:
-

Rethinking this issue, can we move YarnConfiguration.java to hadoop-yarn-common 
to fix the problem? If the patch is committed, Jenkins cannot run the test when 
yarn-default.xml is changed.

 TestYarnConfigurationFields should be moved to hadoop-yarn-api
 --

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641606#comment-14641606
 ] 

Hudson commented on YARN-3967:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: 
rev fbd6063269221ec25834684477f434e19f0b66af)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java


 Fetch the application report from the AHS if the RM does not know about it
 --

 Key: YARN-3967
 URL: https://issues.apache.org/jira/browse/YARN-3967
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.7.2

 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch


 If the application history service has been enabled and RM has forgotten 
 anout an application, try and fetch the app report form the AHS.
 On larger clusters, the RM can forget about the applications in about 30 
 minutes. The proxy url generated during the job submission will try to fetch 
 the app report from the RM and will fail to get anything from there. If the 
 app is not found in the RM, we will need to get the application report from 
 the Application History Server  (if it is enabled) to see if we can get any 
 information on that application before throwing an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641614#comment-14641614
 ] 

Hudson commented on YARN-3026:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3026. Move application-specific container allocation logic from LeafQueue 
to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 
83fe34ac0896cee0918bbfad7bd51231e4aec39b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, 
 YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641615#comment-14641615
 ] 

Hudson commented on YARN-3973:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


 Recent changes to application priority management break reservation system 
 from YARN-1051
 -

 Key: YARN-3973
 URL: https://issues.apache.org/jira/browse/YARN-3973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.8.0

 Attachments: YARN-3973.1.patch, YARN-3973.patch


 Recent changes in trunk (I think YARN-2003) produce NPE for reservation 
 system when application is submitted to a ReservationQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641609#comment-14641609
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* hadoop-yarn-project/CHANGES.txt


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641603#comment-14641603
 ] 

Hudson commented on YARN-3969:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3969. Allow jobs to be submitted to reservation that is active but does 
not have any allocations. (subru via curino) (Carlo Curino: rev 
0fcb4a8cf2add3f112907ff4e833e2f04947b53e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java
YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where 
this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff)
* hadoop-yarn-project/CHANGES.txt


 Allow jobs to be submitted to reservation that is active but does not have 
 any allocations
 --

 Key: YARN-3969
 URL: https://issues.apache.org/jira/browse/YARN-3969
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.8.0, 2.7.2

 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch


 YARN-1051 introduces the notion of reserving resources prior to job 
 submission. A reservation is active from its arrival time to deadline but in 
 the interim there can be instances of time when it does not have any 
 resources allocated. We reject jobs that are submitted when the reservation 
 allocation is zero. Instead we should accept  queue the jobs till the 
 resources become available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-07-25 Thread Carlo Curino (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carlo Curino updated YARN-3656:
---
Issue Type: Sub-task  (was: Improvement)
Parent: YARN-2572

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Ishai Menache
Assignee: Jonathan Yaniv
  Labels: capacity-scheduler, resourcemanager
 Fix For: 2.8.0

 Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, 
 YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, 
 YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf


 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641628#comment-14641628
 ] 

Hudson commented on YARN-3957:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/])
YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page 
to return 500. (Anubhav Dhoot via kasha) (kasha: rev 
d19d18775368f5aaa254881165acc1299837072b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641624#comment-14641624
 ] 

Hudson commented on YARN-3967:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/])
YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: 
rev fbd6063269221ec25834684477f434e19f0b66af)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java


 Fetch the application report from the AHS if the RM does not know about it
 --

 Key: YARN-3967
 URL: https://issues.apache.org/jira/browse/YARN-3967
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.7.2

 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch


 If the application history service has been enabled and RM has forgotten 
 anout an application, try and fetch the app report form the AHS.
 On larger clusters, the RM can forget about the applications in about 30 
 minutes. The proxy url generated during the job submission will try to fetch 
 the app report from the RM and will fail to get anything from there. If the 
 app is not found in the RM, we will need to get the application report from 
 the Application History Server  (if it is enabled) to see if we can get any 
 information on that application before throwing an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641626#comment-14641626
 ] 

Hudson commented on YARN-3925:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/])
YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log 
files from full disks. Contributed by zhihai xu (jlowe: rev 
ff9c13e0a739bb13115167dc661b6a16b2ed2c04)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java


 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641665#comment-14641665
 ] 

Hudson commented on YARN-3969:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3969. Allow jobs to be submitted to reservation that is active but does 
not have any allocations. (subru via curino) (Carlo Curino: rev 
0fcb4a8cf2add3f112907ff4e833e2f04947b53e)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java
YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where 
this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff)
* hadoop-yarn-project/CHANGES.txt


 Allow jobs to be submitted to reservation that is active but does not have 
 any allocations
 --

 Key: YARN-3969
 URL: https://issues.apache.org/jira/browse/YARN-3969
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.8.0, 2.7.2

 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch


 YARN-1051 introduces the notion of reserving resources prior to job 
 submission. A reservation is active from its arrival time to deadline but in 
 the interim there can be instances of time when it does not have any 
 resources allocated. We reject jobs that are submitted when the reservation 
 allocation is zero. Instead we should accept  queue the jobs till the 
 resources become available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641676#comment-14641676
 ] 

Hudson commented on YARN-3026:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3026. Move application-specific container allocation logic from LeafQueue 
to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 
83fe34ac0896cee0918bbfad7bd51231e4aec39b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java


 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, 
 YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641671#comment-14641671
 ] 

Hudson commented on YARN-1051:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3905) Application History Server UI NPEs when accessing apps run after RM restart

2015-07-25 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-3905:
---
Release Note:   (was: Resubmitting patch after fixing checkstyle warnings.)

 Application History Server UI NPEs when accessing apps run after RM restart
 ---

 Key: YARN-3905
 URL: https://issues.apache.org/jira/browse/YARN-3905
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.7.0, 2.8.0, 2.7.1
Reporter: Eric Payne
Assignee: Eric Payne
 Fix For: 3.0.0, 2.8.0, 2.7.2

 Attachments: YARN-3905.001.patch, YARN-3905.002.patch


 From the Application History URL (http://RmHostName:8188/applicationhistory), 
 clicking on the application ID of an app that was run after the RM daemon has 
 been restarted results in a 500 error:
 {noformat}
 Sorry, got error 500
 Please consult RFC 2616 for meanings of the error code.
 {noformat}
 The stack trace is as follows:
 {code}
 2015-07-09 20:13:15,584 [2068024519@qtp-769046918-3] INFO 
 applicationhistoryservice.FileSystemApplicationHistoryStore: Completed 
 reading history information of all application attempts of application 
 application_1436472584878_0001
 2015-07-09 20:13:15,591 [2068024519@qtp-769046918-3] ERROR webapp.AppBlock: 
 Failed to read the AM container of the application attempt 
 appattempt_1436472584878_0001_01.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.convertToContainerReport(ApplicationHistoryManagerImpl.java:206)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getContainer(ApplicationHistoryManagerImpl.java:199)
 at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryClientService.getContainerReport(ApplicationHistoryClientService.java:205)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:272)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock$3.run(AppBlock.java:267)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.webapp.AppBlock.generateApplicationTable(AppBlock.java:266)
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641633#comment-14641633
 ] 

Hudson commented on YARN-3973:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Recent changes to application priority management break reservation system 
 from YARN-1051
 -

 Key: YARN-3973
 URL: https://issues.apache.org/jira/browse/YARN-3973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.8.0

 Attachments: YARN-3973.1.patch, YARN-3973.patch


 Recent changes in trunk (I think YARN-2003) produce NPE for reservation 
 system when application is submitted to a ReservationQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641627#comment-14641627
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641621#comment-14641621
 ] 

Hudson commented on YARN-3969:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #256 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/256/])
YARN-3969. Allow jobs to be submitted to reservation that is active but does 
not have any allocations. (subru via curino) (Carlo Curino: rev 
0fcb4a8cf2add3f112907ff4e833e2f04947b53e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java
YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where 
this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff)
* hadoop-yarn-project/CHANGES.txt


 Allow jobs to be submitted to reservation that is active but does not have 
 any allocations
 --

 Key: YARN-3969
 URL: https://issues.apache.org/jira/browse/YARN-3969
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.8.0, 2.7.2

 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch


 YARN-1051 introduces the notion of reserving resources prior to job 
 submission. A reservation is active from its arrival time to deadline but in 
 the interim there can be instances of time when it does not have any 
 resources allocated. We reject jobs that are submitted when the reservation 
 allocation is zero. Instead we should accept  queue the jobs till the 
 resources become available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2015-07-25 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3971:
---
Attachment: 0002-YARN-3971.patch

Attaching patch with update and testcase.
[~leftnoteasy] Please review patch attached.

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-07-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641736#comment-14641736
 ] 

Varun Saxena commented on YARN-3528:


[~brahmareddy],

Few comments :

# Amongst the test failures which have come in QA report above, except one test 
each in TestDeletionService and TestResourceLocalizationService, all are 
related to your code change. 
# You need to make a call to {{ServerSocketUtil#getPort}} only in places where 
the port is used for binding to a socket. You have used it elsewhere as well.
# In {{TestNodeManagerReboot#createNMConfig}}, below changes would lead to bind 
exception. Because at the time of setting config, call to 
ServerSocketUtil#getPort will return 49152(if free) and both NM_ADDRESS and 
NM_LOCALIZER_ADDRESS will be set using same port. The start port should be 
different at the time of second call to getPort.
{code}
-  conf.set(YarnConfiguration.NM_ADDRESS, 127.0.0.1:12345);
-  conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, 127.0.0.1:12346);
+  conf.set(YarnConfiguration.NM_ADDRESS,
+  127.0.0.1: + ServerSocketUtil.getPort(49152, 10));
+  conf.set(YarnConfiguration.NM_LOCALIZER_ADDRESS, 127.0.0.1:
+  + ServerSocketUtil.getPort(49152, 10));
{code}
# Same problem as above exists in {{TestNodeManagerResync#createNMConfig}} 
which will lead to test failures due to BindException.
# In {{TestNMContainerTokenSecretManager}}, {{TestRMAppLogAggregationStatus}} 
and {{TestNMTokenSecretManagerInNM}}, you do not need to get port from 
ServerSocketUtil to set NodeID
# In {{TestNMWebServer#testNMWebApp}}, you do not need to call 
ServerSocketUtil#getPort to create the token. Token is not used for socket 
binding.
# In {{TestRMApplicationHistoryWriter}} and {{TestAMRMRPCResponseId}}, call to 
MockRM#registerNode does not need a unique port to bind to. So again we do not 
need to get port from ServerSocketUtil
# Same applies for {{TestAMRestart}} wherever MockNM constructor is invoked.
# In {{TestAMRestart}}, you need to have different ports wherever more than one 
MockNM object is created. This is leading to test failure in QA report above.
# Nit: Formatting of below piece of code in {{TestNMWebServer}} is not correct.
{code}
   Token containerToken =
-  BuilderUtils.newContainerToken(containerId, 127.0.0.1, 1234, user,
+ BuilderUtils.newContainerToken(containerId,
+  127.0.0.1, ServerSocketUtil.getPort(49152, 10), user,
{code}


 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641608#comment-14641608
 ] 

Hudson commented on YARN-3925:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log 
files from full disks. Contributed by zhihai xu (jlowe: rev 
ff9c13e0a739bb13115167dc661b6a16b2ed2c04)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt


 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641610#comment-14641610
 ] 

Hudson commented on YARN-3957:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #2194 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/2194/])
YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page 
to return 500. (Anubhav Dhoot via kasha) (kasha: rev 
d19d18775368f5aaa254881165acc1299837072b)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java


 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641668#comment-14641668
 ] 

Hudson commented on YARN-3967:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: 
rev fbd6063269221ec25834684477f434e19f0b66af)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java
* hadoop-yarn-project/CHANGES.txt


 Fetch the application report from the AHS if the RM does not know about it
 --

 Key: YARN-3967
 URL: https://issues.apache.org/jira/browse/YARN-3967
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.7.2

 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch


 If the application history service has been enabled and RM has forgotten 
 anout an application, try and fetch the app report form the AHS.
 On larger clusters, the RM can forget about the applications in about 30 
 minutes. The proxy url generated during the job submission will try to fetch 
 the app report from the RM and will fail to get anything from there. If the 
 app is not found in the RM, we will need to get the application report from 
 the Application History Server  (if it is enabled) to see if we can get any 
 information on that application before throwing an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641672#comment-14641672
 ] 

Hudson commented on YARN-3957:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page 
to return 500. (Anubhav Dhoot via kasha) (kasha: rev 
d19d18775368f5aaa254881165acc1299837072b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java


 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641670#comment-14641670
 ] 

Hudson commented on YARN-3925:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log 
files from full disks. Contributed by zhihai xu (jlowe: rev 
ff9c13e0a739bb13115167dc661b6a16b2ed2c04)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* hadoop-yarn-project/CHANGES.txt


 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641677#comment-14641677
 ] 

Hudson commented on YARN-3973:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #264 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/264/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Recent changes to application priority management break reservation system 
 from YARN-1051
 -

 Key: YARN-3973
 URL: https://issues.apache.org/jira/browse/YARN-3973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.8.0

 Attachments: YARN-3973.1.patch, YARN-3973.patch


 Recent changes in trunk (I think YARN-2003) produce NPE for reservation 
 system when application is submitted to a ReservationQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2015-07-25 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3971:
---
Attachment: 0003-YARN-3971.patch

Testcase failure is unrelated. Verified locally testcase is passing.
Fixed checkstyle


 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
 0003-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:964)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1005)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1666)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:312)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:832)
 at 
 org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:422)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
 at 
 org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI

2015-07-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3948:
--
Attachment: (was: ApplicationPage.png)

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, ApplicationPage.png, 
 ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI

2015-07-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3948:
--
Attachment: ClusterPage.png
ApplicationPage.png

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, ApplicationPage.png, 
 ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI

2015-07-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3948:
--
Attachment: (was: ClusterPage.png)

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, ApplicationPage.png, 
 ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3948) Display Application Priority in RM Web UI

2015-07-25 Thread Sunil G (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sunil G updated YARN-3948:
--
Attachment: 0002-YARN-3948.patch

Thank you [~jianhe] and [~rohithsharma]
Uploading a new patch after addressing the review comments. Also attached new 
screen shots.

 Display Application Priority in RM Web UI
 -

 Key: YARN-3948
 URL: https://issues.apache.org/jira/browse/YARN-3948
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: webapp
Affects Versions: 2.7.1
Reporter: Sunil G
Assignee: Sunil G
 Attachments: 0001-YARN-3948.patch, 0002-YARN-3948.patch, 
 ApplicationPage.png, ClusterPage.png


 Application Priority can be displayed in RM Web UI Application page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-07-25 Thread Carlo Curino (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641645#comment-14641645
 ] 

Carlo Curino commented on YARN-3656:


+1 on this patch. I committed this to trunk/branch-2, after manually inspecting 
the checkstyle and javadoc (fix a couple more but the rest will not go away as 
checkstyle misses a few uses of link tag in javadoc). 

Thanks [~jyaniv] and [~imenache] for this important contribution, (and for 
having tested it and polished those algos endlessly). 
Thanks to [~subru] for helping sheparding this and [~asuresh] for reviewing. 

 LowCost: A Cost-Based Placement Agent for YARN Reservations
 ---

 Key: YARN-3656
 URL: https://issues.apache.org/jira/browse/YARN-3656
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager
Affects Versions: 2.6.0
Reporter: Ishai Menache
Assignee: Jonathan Yaniv
  Labels: capacity-scheduler, resourcemanager
 Fix For: 2.8.0

 Attachments: LowCostRayonExternal.pdf, YARN-3656-v1.1.patch, 
 YARN-3656-v1.2.patch, YARN-3656-v1.3.patch, YARN-3656-v1.4.patch, 
 YARN-3656-v1.5.patch, YARN-3656-v1.patch, lowcostrayonexternal_v2.pdf


 YARN-1051 enables SLA support by allowing users to reserve cluster capacity 
 ahead of time. YARN-1710 introduced a greedy agent for placing user 
 reservations. The greedy agent makes fast placement decisions but at the cost 
 of ignoring the cluster committed resources, which might result in blocking 
 the cluster resources for certain periods of time, and in turn rejecting some 
 arriving jobs.
 We propose LowCost – a new cost-based planning algorithm. LowCost “spreads” 
 the demand of the job throughout the allowed time-window according to a 
 global, load-based cost function. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3656) LowCost: A Cost-Based Placement Agent for YARN Reservations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641644#comment-14641644
 ] 

Hudson commented on YARN-3656:
--

FAILURE: Integrated in Hadoop-trunk-Commit #8222 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/8222/])
YARN-3656. LowCost: A Cost-Based Placement Agent for YARN Reservations. 
(Jonathan Yaniv and Ishai Menache via curino) (ccurino: rev 
156f24ead00436faad5d4aeef327a546392cd265)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageEarliestStart.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestCapacityOverTimePolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageEarliestStartByDemand.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/SimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestAlignedPlanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageEarliestStartByJobArrival.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationSystemUtil.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageAllocator.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestGreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TestSimpleCapacityReplanner.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/AbstractReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/ReservationAgent.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/PlanContext.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/StageAllocatorGreedy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestNoOverCommitPolicy.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/TryManyReservationAgents.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/ReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/GreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/planning/GreedyReservationAgent.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/TestFairReservationSystem.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/reservation/RLESparseResourceAllocation.java
* 

[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641690#comment-14641690
 ] 

Hudson commented on YARN-3967:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: 
rev fbd6063269221ec25834684477f434e19f0b66af)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java


 Fetch the application report from the AHS if the RM does not know about it
 --

 Key: YARN-3967
 URL: https://issues.apache.org/jira/browse/YARN-3967
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.7.2

 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch


 If the application history service has been enabled and RM has forgotten 
 anout an application, try and fetch the app report form the AHS.
 On larger clusters, the RM can forget about the applications in about 30 
 minutes. The proxy url generated during the job submission will try to fetch 
 the app report from the RM and will fail to get anything from there. If the 
 app is not found in the RM, we will need to get the application report from 
 the Application History Server  (if it is enabled) to see if we can get any 
 information on that application before throwing an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641692#comment-14641692
 ] 

Hudson commented on YARN-3925:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log 
files from full disks. Contributed by zhihai xu (jlowe: rev 
ff9c13e0a739bb13115167dc661b6a16b2ed2c04)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt


 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641698#comment-14641698
 ] 

Hudson commented on YARN-3026:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3026. Move application-specific container allocation logic from LeafQueue 
to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 
83fe34ac0896cee0918bbfad7bd51231e4aec39b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java


 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, 
 YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641687#comment-14641687
 ] 

Hudson commented on YARN-3969:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3969. Allow jobs to be submitted to reservation that is active but does 
not have any allocations. (subru via curino) (Carlo Curino: rev 
0fcb4a8cf2add3f112907ff4e833e2f04947b53e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java
* hadoop-yarn-project/CHANGES.txt
YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where 
this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff)
* hadoop-yarn-project/CHANGES.txt


 Allow jobs to be submitted to reservation that is active but does not have 
 any allocations
 --

 Key: YARN-3969
 URL: https://issues.apache.org/jira/browse/YARN-3969
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.8.0, 2.7.2

 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch


 YARN-1051 introduces the notion of reserving resources prior to job 
 submission. A reservation is active from its arrival time to deadline but in 
 the interim there can be instances of time when it does not have any 
 resources allocated. We reject jobs that are submitted when the reservation 
 allocation is zero. Instead we should accept  queue the jobs till the 
 resources become available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641693#comment-14641693
 ] 

Hudson commented on YARN-1051:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641694#comment-14641694
 ] 

Hudson commented on YARN-3957:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page 
to return 500. (Anubhav Dhoot via kasha) (kasha: rev 
d19d18775368f5aaa254881165acc1299837072b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java


 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641699#comment-14641699
 ] 

Hudson commented on YARN-3973:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2213 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2213/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Recent changes to application priority management break reservation system 
 from YARN-1051
 -

 Key: YARN-3973
 URL: https://issues.apache.org/jira/browse/YARN-3973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.8.0

 Attachments: YARN-3973.1.patch, YARN-3973.patch


 Recent changes in trunk (I think YARN-2003) produce NPE for reservation 
 system when application is submitted to a ReservationQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2015-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641732#comment-14641732
 ] 

Hadoop QA commented on YARN-3971:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 35s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 46s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m 19s | The applied patch generated  1 
new checkstyle issues (total was 31, now 32). |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 34s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  0s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 58s | Tests passed in 
hadoop-yarn-common. |
| {color:red}-1{color} | yarn tests |  52m 58s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 55s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747193/0002-YARN-3971.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 156f24e |
| checkstyle |  
https://builds.apache.org/job/PreCommit-YARN-Build/8670/artifact/patchprocess/diffcheckstylehadoop-yarn-common.txt
 |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8670/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8670/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8670/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8670/console |


This message was automatically generated.

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Updated] (YARN-3884) RMContainerImpl transition from RESERVED to KILL apphistory status not updated

2015-07-25 Thread Bibin A Chundatt (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bibin A Chundatt updated YARN-3884:
---
Component/s: timelineserver

 RMContainerImpl transition from RESERVED to KILL apphistory status not updated
 --

 Key: YARN-3884
 URL: https://issues.apache.org/jira/browse/YARN-3884
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
 Environment: Suse11 Sp3
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3884.patch, Apphistory Container Status.jpg, 
 Elapsed Time.jpg, Test Result-Container status.jpg


 Setup
 ===
 1 NM 3072 16 cores each
 Steps to reproduce
 ===
 1.Submit apps  to Queue 1 with 512 mb 1 core
 2.Submit apps  to Queue 2 with 512 mb and 5 core
 lots of containers get reserved and unreserved in this case 
 {code}
 2015-07-02 20:45:31,169 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_e24_1435849994778_0002_01_13 Container Transitioned from NEW to 
 RESERVED
 2015-07-02 20:45:31,170 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 Reserved container  application=application_1435849994778_0002 
 resource=memory:512, vCores:5 queue=QueueA: capacity=0.4, 
 absoluteCapacity=0.4, usedResources=memory:2560, vCores:21, 
 usedCapacity=1.6410257, absoluteUsedCapacity=0.65625, numApps=1, 
 numContainers=5 usedCapacity=1.6410257 absoluteUsedCapacity=0.65625 
 used=memory:2560, vCores:21 cluster=memory:6144, vCores:32
 2015-07-02 20:45:31,170 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting assigned queue: root.QueueA stats: QueueA: capacity=0.4, 
 absoluteCapacity=0.4, usedResources=memory:3072, vCores:26, 
 usedCapacity=2.0317461, absoluteUsedCapacity=0.8125, numApps=1, 
 numContainers=6
 2015-07-02 20:45:31,170 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=0.96875 
 absoluteUsedCapacity=0.96875 used=memory:5632, vCores:31 
 cluster=memory:6144, vCores:32
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_e24_1435849994778_0001_01_14 Container Transitioned from NEW to 
 ALLOCATED
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=dsperf   
 OPERATION=AM Allocated ContainerTARGET=SchedulerApp 
 RESULT=SUCCESS  APPID=application_1435849994778_0001
 CONTAINERID=container_e24_1435849994778_0001_01_14
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: 
 Assigned container container_e24_1435849994778_0001_01_14 of capacity 
 memory:512, vCores:1 on host host-10-19-92-117:64318, which has 6 
 containers, memory:3072, vCores:14 used and memory:0, vCores:2 available 
 after allocation
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 assignedContainer application attempt=appattempt_1435849994778_0001_01 
 container=Container: [ContainerId: 
 container_e24_1435849994778_0001_01_14, NodeId: host-10-19-92-117:64318, 
 NodeHttpAddress: host-10-19-92-117:65321, Resource: memory:512, vCores:1, 
 Priority: 20, Token: null, ] queue=default: capacity=0.2, 
 absoluteCapacity=0.2, usedResources=memory:2560, vCores:5, 
 usedCapacity=2.0846906, absoluteUsedCapacity=0.4166, numApps=1, 
 numContainers=5 clusterResource=memory:6144, vCores:32
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 Re-sorting assigned queue: root.default stats: default: capacity=0.2, 
 absoluteCapacity=0.2, usedResources=memory:3072, vCores:6, 
 usedCapacity=2.5016286, absoluteUsedCapacity=0.5, numApps=1, numContainers=6
 2015-07-02 20:45:31,191 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: 
 assignedContainer queue=root usedCapacity=1.0 absoluteUsedCapacity=1.0 
 used=memory:6144, vCores:32 cluster=memory:6144, vCores:32
 2015-07-02 20:45:32,143 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: 
 container_e24_1435849994778_0001_01_14 Container Transitioned from 
 ALLOCATED to ACQUIRED
 2015-07-02 20:45:32,174 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Trying to fulfill reservation for application application_1435849994778_0002 
 on node: host-10-19-92-143:64318
 2015-07-02 20:45:32,174 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: 
 Reserved container  application=application_1435849994778_0002 
 

[jira] [Created] (YARN-3978) Configurably turn off the saving of container info in Generic AHS

2015-07-25 Thread Eric Payne (JIRA)
Eric Payne created YARN-3978:


 Summary: Configurably turn off the saving of container info in 
Generic AHS
 Key: YARN-3978
 URL: https://issues.apache.org/jira/browse/YARN-3978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver, yarn
Reporter: Eric Payne
Assignee: Eric Payne


Depending on how each application's metadata is stored, one week's worth of 
data stored in the Generic Application History Server's database can grow to be 
almost a terabyte of local disk space. In order to alleviate this, I suggest 
that there is a need for a configuration option to turn off saving of non-AM 
container metadata in the GAHS data store.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3971) Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel recovery

2015-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641766#comment-14641766
 ] 

Hadoop QA commented on YARN-3971:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  17m 46s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 42s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m 44s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 23s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 32s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   3m  1s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 57s | Tests passed in 
hadoop-yarn-common. |
| {color:green}+1{color} | yarn tests |  52m 20s | Tests passed in 
hadoop-yarn-server-resourcemanager. |
| | |  96m 38s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12747203/0003-YARN-3971.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 156f24e |
| hadoop-yarn-common test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8671/artifact/patchprocess/testrun_hadoop-yarn-common.txt
 |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8671/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8671/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8671/console |


This message was automatically generated.

 Skip RMNodeLabelsManager#checkRemoveFromClusterNodeLabelsOfQueue on nodelabel 
 recovery
 --

 Key: YARN-3971
 URL: https://issues.apache.org/jira/browse/YARN-3971
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3971.patch, 0002-YARN-3971.patch, 
 0003-YARN-3971.patch


 Steps to reproduce 
 # Create label x,y
 # Delete label x,y
 # Create label x,y add capacity scheduler xml for labels x and y too
 # Restart RM 
  
 Both RM will become Standby.
 Since below exception is thrown on {{FileSystemNodeLabelsStore#recover}}
 {code}
 2015-07-23 14:03:33,627 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in 
 state STARTED; cause: java.io.IOException: Cannot remove label=x, because 
 queue=a1 is using this label. Please remove label on queue before remove the 
 label
 java.io.IOException: Cannot remove label=x, because queue=a1 is using this 
 label. Please remove label on queue before remove the label
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.checkRemoveFromClusterNodeLabelsOfQueue(RMNodeLabelsManager.java:104)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.nodelabels.RMNodeLabelsManager.removeFromClusterNodeLabels(RMNodeLabelsManager.java:118)
 at 
 org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore.recover(FileSystemNodeLabelsStore.java:221)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.initNodeLabelStore(CommonNodeLabelsManager.java:232)
 at 
 org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStart(CommonNodeLabelsManager.java:245)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:587)
 at 
 org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
 at 
 

[jira] [Commented] (YARN-3893) Both RM in active state when Admin#transitionToActive failure from refeshAll()

2015-07-25 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641743#comment-14641743
 ] 

Bibin A Chundatt commented on YARN-3893:


{quote}
Instead of checking for exception message in test, can you check for 
ServiceFailedException
{quote}
Already the same is verified in may testcases using messages.

{quote}
Can you add a verification in the test to check whether active services were 
stopped ?
{quote}

IMO its not required.

 Both RM in active state when Admin#transitionToActive failure from refeshAll()
 --

 Key: YARN-3893
 URL: https://issues.apache.org/jira/browse/YARN-3893
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
Priority: Critical
 Attachments: 0001-YARN-3893.patch, 0002-YARN-3893.patch, 
 0003-YARN-3893.patch, 0004-YARN-3893.patch, yarn-site.xml


 Cases that can cause this.
 # Capacity scheduler xml is wrongly configured during switch
 # Refresh ACL failure due to configuration
 # Refresh User group failure due to configuration
 Continuously both RM will try to be active
 {code}
 dsperf@host-10-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm1
 15/07/07 19:08:10 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 dsperf@host-128:/opt/bibin/dsperf/OPENSOURCE_3_0/install/hadoop/resourcemanager/bin
  ./yarn rmadmin  -getServiceState rm2
 15/07/07 19:08:12 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 active
 {code}
 # Both Web UI active
 # Status shown as active for both RM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3940) Application moveToQueue should check NodeLabel permission

2015-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641769#comment-14641769
 ] 

Hadoop QA commented on YARN-3940:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | pre-patch |  15m  7s | Findbugs (version ) appears to 
be broken on trunk. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 38s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 40s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 23s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  1s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 22s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 26s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |  52m 16s | Tests failed in 
hadoop-yarn-server-resourcemanager. |
| | |  88m 51s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.resourcemanager.reservation.TestCapacitySchedulerPlanFollower
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12746872/0001-YARN-3940.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 156f24e |
| hadoop-yarn-server-resourcemanager test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8672/artifact/patchprocess/testrun_hadoop-yarn-server-resourcemanager.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8672/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8672/console |


This message was automatically generated.

 Application moveToQueue should check NodeLabel permission 
 --

 Key: YARN-3940
 URL: https://issues.apache.org/jira/browse/YARN-3940
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bibin A Chundatt
Assignee: Bibin A Chundatt
 Attachments: 0001-YARN-3940.patch


 Configure capacity scheduler 
 Configure node label an submit application {{queue=A Label=X}}
 Move application to queue {{B}} and x is not having access
 {code}
 2015-07-20 19:46:19,626 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
  Application attempt appattempt_1437385548409_0005_01 released container 
 container_e08_1437385548409_0005_01_02 on node: host: 
 host-10-19-92-117:64318 #containers=1 available=memory:2560, vCores:15 
 used=memory:512, vCores:1 with event: KILL
 2015-07-20 19:46:20,970 WARN 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: 
 Invalid resource ask by application appattempt_1437385548409_0005_01
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid 
 resource request, queue=b1 doesn't have permission to access all labels in 
 resource request. labelExpression of resource request=x. Queue labels=y
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:304)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndValidateRequest(SchedulerUtils.java:234)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.normalizeAndvalidateRequest(SchedulerUtils.java:250)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:106)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:515)
 at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
 at 
 org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:636)
 

[jira] [Commented] (YARN-3978) Configurably turn off the saving of container info in Generic AHS

2015-07-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641768#comment-14641768
 ] 

Eric Payne commented on YARN-3978:
--

Use Case: A user launches an application on a secured cluster that runs for 
some time and then fails within the AM (perhaps due to OOM in the AM), leaving 
no history in the job history server. The user doesn't notice that the job has 
failed until after the application has dropped off of the RM's application 
store. At this point, if no information was stored in the Generic Application 
History Service, a user must rely on a priviledged system administrator to 
access the AM logs for them.

It is desirable to activate the Generic Application History service within the 
timeline server so that users can access their application's information even 
after the RM has forgotten about their application. This app information should 
be kept in the GAHS for 1 week, as is done, for example, for logs in the job 
history server.

One way that the Generic AHS stores metadata about an application is in an 
Entity levelDB. This includes information about each container for each 
application. Based on my analysis, the levelDB size grows by at least 2500 
bytes per container (uncompressed). This is a conservative estimate as the size 
could be much bigger based on the amount of diagnostic information associated 
with failed containers.

On very large and busy clusters, the amount needed on the timeline server's 
local disk would be between 0.6 TB and 1.0 TB (uncompressed). Even if we assume 
90% compression, that's still between 60 GB and 100 GB that will be needed on 
the local disk. In addition to this, between 80 GB and 143 GB of metadata 
(uncopressed) will need to be cleaned up every day from the levelDB, which will 
delay other processing in the timeline server.

The proposal of this JIRA is to add a configuration property that 
enables/disables whether or not the GAHS stores container information in the 
levelDB. Whith this change, I estimate that the local disk usage would be about 
5700 bytes per job, or about 10 GB (uncompressed) per week. Additionally, the 
daily cleanup load would only be about 1.5 GB per day.


 Configurably turn off the saving of container info in Generic AHS
 -

 Key: YARN-3978
 URL: https://issues.apache.org/jira/browse/YARN-3978
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: timelineserver, yarn
Reporter: Eric Payne
Assignee: Eric Payne

 Depending on how each application's metadata is stored, one week's worth of 
 data stored in the Generic Application History Server's database can grow to 
 be almost a terabyte of local disk space. In order to alleviate this, I 
 suggest that there is a need for a configuration option to turn off saving of 
 non-AM container metadata in the GAHS data store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641770#comment-14641770
 ] 

zhihai xu commented on YARN-3925:
-

thanks [~jlowe] for reviewing and committing the patch! 

 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641516#comment-14641516
 ] 

Hudson commented on YARN-3026:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3026. Move application-specific container allocation logic from LeafQueue 
to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 
83fe34ac0896cee0918bbfad7bd51231e4aec39b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java


 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, 
 YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641508#comment-14641508
 ] 

Hudson commented on YARN-3967:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: 
rev fbd6063269221ec25834684477f434e19f0b66af)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java


 Fetch the application report from the AHS if the RM does not know about it
 --

 Key: YARN-3967
 URL: https://issues.apache.org/jira/browse/YARN-3967
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.7.2

 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch


 If the application history service has been enabled and RM has forgotten 
 anout an application, try and fetch the app report form the AHS.
 On larger clusters, the RM can forget about the applications in about 30 
 minutes. The proxy url generated during the job submission will try to fetch 
 the app report from the RM and will fail to get anything from there. If the 
 app is not found in the RM, we will need to get the application report from 
 the Application History Server  (if it is enabled) to see if we can get any 
 information on that application before throwing an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641512#comment-14641512
 ] 

Hudson commented on YARN-3957:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page 
to return 500. (Anubhav Dhoot via kasha) (kasha: rev 
d19d18775368f5aaa254881165acc1299837072b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java


 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641510#comment-14641510
 ] 

Hudson commented on YARN-3925:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log 
files from full disks. Contributed by zhihai xu (jlowe: rev 
ff9c13e0a739bb13115167dc661b6a16b2ed2c04)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt


 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3925) ContainerLogsUtils#getContainerLogFile fails to read container log files from full disks.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641524#comment-14641524
 ] 

Hudson commented on YARN-3925:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3925. ContainerLogsUtils#getContainerLogFile fails to read container log 
files from full disks. Contributed by zhihai xu (jlowe: rev 
ff9c13e0a739bb13115167dc661b6a16b2ed2c04)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LocalDirsHandlerService.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/webapp/TestContainerLogsPage.java
* hadoop-yarn-project/CHANGES.txt


 ContainerLogsUtils#getContainerLogFile fails to read container log files from 
 full disks.
 -

 Key: YARN-3925
 URL: https://issues.apache.org/jira/browse/YARN-3925
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.7.1
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
 Fix For: 2.7.2

 Attachments: YARN-3925.000.patch, YARN-3925.001.patch


 ContainerLogsUtils#getContainerLogFile fails to read files from full disks.
 {{getContainerLogFile}} depends on 
 {{LocalDirsHandlerService#getLogPathToRead}} to get the log file, but 
 {{LocalDirsHandlerService#getLogPathToRead}} calls 
 {{logDirsAllocator.getLocalPathToRead}} and {{logDirsAllocator}} uses 
 configuration {{YarnConfiguration.NM_LOG_DIRS}}, which will be updated to not 
 include full disks in {{LocalDirsHandlerService#checkDirs}}:
 {code}
 Configuration conf = getConfig();
 ListString localDirs = getLocalDirs();
 conf.setStrings(YarnConfiguration.NM_LOCAL_DIRS,
 localDirs.toArray(new String[localDirs.size()]));
 ListString logDirs = getLogDirs();
 conf.setStrings(YarnConfiguration.NM_LOG_DIRS,
   logDirs.toArray(new String[logDirs.size()]));
 {code}
 ContainerLogsUtils#getContainerLogFile is used by NMWebServices#getLogs and 
 ContainerLogsPage.ContainersLogsBlock#render to read the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641531#comment-14641531
 ] 

Hudson commented on YARN-3973:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Recent changes to application priority management break reservation system 
 from YARN-1051
 -

 Key: YARN-3973
 URL: https://issues.apache.org/jira/browse/YARN-3973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.8.0

 Attachments: YARN-3973.1.patch, YARN-3973.patch


 Recent changes in trunk (I think YARN-2003) produce NPE for reservation 
 system when application is submitted to a ReservationQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641525#comment-14641525
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3967) Fetch the application report from the AHS if the RM does not know about it

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641522#comment-14641522
 ] 

Hudson commented on YARN-3967:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3967. Fetch the application report from the AHS if the RM does not (xgong: 
rev fbd6063269221ec25834684477f434e19f0b66af)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/AppReportFetcher.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/test/java/org/apache/hadoop/yarn/server/webproxy/TestAppReportFetcher.java
* hadoop-yarn-project/CHANGES.txt


 Fetch the application report from the AHS if the RM does not know about it
 --

 Key: YARN-3967
 URL: https://issues.apache.org/jira/browse/YARN-3967
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.1
Reporter: Mit Desai
Assignee: Mit Desai
 Fix For: 2.7.2

 Attachments: YARN-3967.1.patch, YARN-3967.2.patch, YARN-3967.3.patch


 If the application history service has been enabled and RM has forgotten 
 anout an application, try and fetch the app report form the AHS.
 On larger clusters, the RM can forget about the applications in about 30 
 minutes. The proxy url generated during the job submission will try to fetch 
 the app report from the RM and will fail to get anything from there. If the 
 app is not found in the RM, we will need to get the application report from 
 the Application History Server  (if it is enabled) to see if we can get any 
 information on that application before throwing an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3957) FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 500

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641526#comment-14641526
 ] 

Hudson commented on YARN-3957:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3957. FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page 
to return 500. (Anubhav Dhoot via kasha) (kasha: rev 
d19d18775368f5aaa254881165acc1299837072b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/TestFairSchedulerQueueInfo.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/FairSchedulerQueueInfo.java
* hadoop-yarn-project/CHANGES.txt


 FairScheduler NPE In FairSchedulerQueueInfo causing scheduler page to return 
 500
 

 Key: YARN-3957
 URL: https://issues.apache.org/jira/browse/YARN-3957
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Fix For: 2.8.0

 Attachments: YARN-3957.001.patch, YARN-3957.002.patch


 There is a NPE causing the webpage of 
 http://localhost:23188/cluster/scheduler to return a 500. This seems to be 
 because of YARN-2336 setting null for childQueues and then getChildQueues 
 hits the NPE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3026) Move application-specific container allocation logic from LeafQueue to FiCaSchedulerApp

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641530#comment-14641530
 ] 

Hudson commented on YARN-3026:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3026. Move application-specific container allocation logic from LeafQueue 
to FiCaSchedulerApp. Contributed by Wangda Tan (jianhe: rev 
83fe34ac0896cee0918bbfad7bd51231e4aec39b)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSAssignment.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/common/fica/FiCaSchedulerApp.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservations.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityHeadroomProvider.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/ResourceLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestApplicationLimits.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/AbstractCSQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestUtils.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestContainerAllocation.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestCapacityScheduler.java


 Move application-specific container allocation logic from LeafQueue to 
 FiCaSchedulerApp
 ---

 Key: YARN-3026
 URL: https://issues.apache.org/jira/browse/YARN-3026
 Project: Hadoop YARN
  Issue Type: Task
  Components: capacityscheduler
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.8.0

 Attachments: YARN-3026.1.patch, YARN-3026.2.patch, YARN-3026.3.patch, 
 YARN-3026.4.patch, YARN-3026.5.patch, YARN-3026.6.patch


 Have a discussion with [~vinodkv] and [~jianhe]: 
 In existing Capacity Scheduler, all allocation logics of and under LeafQueue 
 are located in LeafQueue.java in implementation. To make a cleaner scope of 
 LeafQueue, we'd better move some of them to FiCaSchedulerApp.
 Ideal scope of LeafQueue should be: when a LeafQueue receives some resources 
 from ParentQueue (like 15% of cluster resource), and it distributes resources 
 to children apps, and it should be agnostic to internal logic of children 
 apps (like delayed-scheduling, etc.). IAW, LeafQueue shouldn't decide how 
 application allocating container from given resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641519#comment-14641519
 ] 

Hudson commented on YARN-3969:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #997 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/997/])
YARN-3969. Allow jobs to be submitted to reservation that is active but does 
not have any allocations. (subru via curino) (Carlo Curino: rev 
0fcb4a8cf2add3f112907ff4e833e2f04947b53e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java
YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where 
this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff)
* hadoop-yarn-project/CHANGES.txt


 Allow jobs to be submitted to reservation that is active but does not have 
 any allocations
 --

 Key: YARN-3969
 URL: https://issues.apache.org/jira/browse/YARN-3969
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.8.0, 2.7.2

 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch


 YARN-1051 introduces the notion of reserving resources prior to job 
 submission. A reservation is active from its arrival time to deadline but in 
 the interim there can be instances of time when it does not have any 
 resources allocated. We reject jobs that are submitted when the reservation 
 allocation is zero. Instead we should accept  queue the jobs till the 
 resources become available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1051) YARN Admission Control/Planner: enhancing the resource allocation model with time.

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641511#comment-14641511
 ] 

Hudson commented on YARN-1051:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 YARN Admission Control/Planner: enhancing the resource allocation model with 
 time.
 --

 Key: YARN-1051
 URL: https://issues.apache.org/jira/browse/YARN-1051
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacityscheduler, resourcemanager, scheduler
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.6.0

 Attachments: YARN-1051-design.pdf, YARN-1051.1.patch, 
 YARN-1051.patch, curino_MSR-TR-2013-108.pdf, socc14-paper15.pdf, 
 techreport.pdf


 In this umbrella JIRA we propose to extend the YARN RM to handle time 
 explicitly, allowing users to reserve capacity over time. This is an 
 important step towards SLAs, long-running services, workflows, and helps for 
 gang scheduling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3969) Allow jobs to be submitted to reservation that is active but does not have any allocations

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641505#comment-14641505
 ] 

Hudson commented on YARN-3969:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3969. Allow jobs to be submitted to reservation that is active but does 
not have any allocations. (subru via curino) (Carlo Curino: rev 
0fcb4a8cf2add3f112907ff4e833e2f04947b53e)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestReservationQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ReservationQueue.java
YARN-3969. Updating CHANGES.txt to reflect the correct set of branches where 
this is committed (Carlo Curino: rev fc42fa8ae3bc9d6d055090a7bb5e6f0c5972fcff)
* hadoop-yarn-project/CHANGES.txt


 Allow jobs to be submitted to reservation that is active but does not have 
 any allocations
 --

 Key: YARN-3969
 URL: https://issues.apache.org/jira/browse/YARN-3969
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler, fairscheduler, resourcemanager
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Fix For: 2.8.0, 2.7.2

 Attachments: YARN-3969-v1.patch, YARN-3969-v2.patch


 YARN-1051 introduces the notion of reserving resources prior to job 
 submission. A reservation is active from its arrival time to deadline but in 
 the interim there can be instances of time when it does not have any 
 resources allocated. We reject jobs that are submitted when the reservation 
 allocation is zero. Instead we should accept  queue the jobs till the 
 resources become available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3973) Recent changes to application priority management break reservation system from YARN-1051

2015-07-25 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641517#comment-14641517
 ] 

Hudson commented on YARN-3973:
--

FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #267 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/267/])
YARN-3973. Recent changes to application priority management break reservation 
system from YARN-1051 (Carlo Curino via wangda) (wangda: rev 
a3bd7b4a59b3664273dc424f240356838213d4e7)
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java


 Recent changes to application priority management break reservation system 
 from YARN-1051
 -

 Key: YARN-3973
 URL: https://issues.apache.org/jira/browse/YARN-3973
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 2.8.0

 Attachments: YARN-3973.1.patch, YARN-3973.patch


 Recent changes in trunk (I think YARN-2003) produce NPE for reservation 
 system when application is submitted to a ReservationQueue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3528) Tests with 12345 as hard-coded port break jenkins

2015-07-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641541#comment-14641541
 ] 

Varun Saxena commented on YARN-3528:


[~brahmareddy], you have missed some test class in your latest patch. For 
instance, TestNodeManagerShutdown.

 Tests with 12345 as hard-coded port break jenkins
 -

 Key: YARN-3528
 URL: https://issues.apache.org/jira/browse/YARN-3528
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: ASF Jenkins
Reporter: Steve Loughran
Assignee: Brahma Reddy Battula
Priority: Blocker
  Labels: test
 Attachments: YARN-3528-002.patch, YARN-3528-003.patch, 
 YARN-3528-004.patch, YARN-3528.patch


 A lot of the YARN tests have hard-coded the port 12345 for their services to 
 come up on.
 This makes it impossible to have scheduled or precommit tests to run 
 consistently on the ASF jenkins hosts. Instead the tests fail regularly and 
 appear to get ignored completely.
 A quick grep of 12345 shows up many places in the test suite where this 
 practise has developed.
 * All {{BaseContainerManagerTest}} subclasses
 * {{TestNodeManagerShutdown}}
 * {{TestContainerManager}}
 + others
 This needs to be addressed through portscanning and dynamic port allocation. 
 Please can someone do this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3958) TestYarnConfigurationFields should be moved to hadoop-yarn-api

2015-07-25 Thread Varun Saxena (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641558#comment-14641558
 ] 

Varun Saxena commented on YARN-3958:


[~ajisakaa],
Checked the test. You are correct.
But I think this would be a major change.
I have a concern that there might be some projects which might have induced 
dependency in their pom on hadoop-yarn-api because they want to use 
YarnConfiguration class. Pls note hadoop-yarn-api does not have dependency on 
hadoop-yarn-common in its pom.xml
Change can be made but this should go in branch-2 then ?

Moreover, realistically will somebody add a YARN related config in 
yarn-default.xml but not add it in YarnConfiguration class ?
I think unlikely. The reverse happens far more frequently.

So in branch-2 we can just move this test to hadoop-yarn-api and in trunk, move 
YarnConfiguration to hadoop-yarn-common.  Thoughts ?

 TestYarnConfigurationFields should be moved to hadoop-yarn-api
 --

 Key: YARN-3958
 URL: https://issues.apache.org/jira/browse/YARN-3958
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Varun Saxena
Assignee: Varun Saxena
 Attachments: YARN-3958.01.patch, YARN-3958.02.patch, 
 YARN-3958.03.patch


 Currently TestYarnConfigurationFields is present in hadoop-yarn-common. The 
 test is for checking whether all the configurations declared in 
 YarnConfiguration exist in yarn-default.xml or not.
 But as YarnConfiguration is in hadoop-yarn-api, if somebody changes this 
 file, it is not necessary that this test will be run. So if the developer 
 misses to update yarn-default.xml and patch is committed, it will lead to 
 unnecessary test failures after commit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)