[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163360#comment-14163360
 ] 

Hudson commented on YARN-1857:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #705 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/705/])
YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. 
Contributed by Chen He and Craig Welch (jianhe: rev 
30d56fdbb40d06c4e267d6c314c8c767a7adc6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163513#comment-14163513
 ] 

Hudson commented on YARN-1857:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1895 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1895/])
YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. 
Contributed by Chen He and Craig Welch (jianhe: rev 
30d56fdbb40d06c4e267d6c314c8c767a7adc6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt


 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163591#comment-14163591
 ] 

Hudson commented on YARN-1857:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1920 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1920/])
YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. 
Contributed by Chen He and Craig Welch (jianhe: rev 
30d56fdbb40d06c4e267d6c314c8c767a7adc6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java


 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162170#comment-14162170
 ] 

Craig Welch commented on YARN-1857:
---

This is an interesting question - that logic predates this change, and I 
wondered if there were cases when userLimit could somehow be  queueMaxCap, and 
as I look at the code, surprisingly, I believe so.  Userlimit is calculated 
based on absolute queue values whereas, at least since [YARN-2008], queueMaxCap 
takes into account actual useage in other queues.  So, it is entirely possible 
for userLimit to be  queueMaxCap due to how they are calculated, at least post 
[YARN-2008].  I'm not sure if pre-2008 that was possible as well, it may have 
been, there is a bit to how that was calculated even before that change - in 
any event, it is the case now.  So, as it happens, I don't believe we can do 
the simplification.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162357#comment-14162357
 ] 

Chen He commented on YARN-1857:
---

Hi [~cwelch], 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
 has the detail information about these parameters. 

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162461#comment-14162461
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673410/YARN-1857.7.patch
  against trunk revision 9196db9.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5311//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5311//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162487#comment-14162487
 ] 

Jian He commented on YARN-1857:
---

Craig, thanks for updating. 
looks good, +1

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162496#comment-14162496
 ] 

Chen He commented on YARN-1857:
---

Unit test failure is because of YARN-2400

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162497#comment-14162497
 ] 

Chen He commented on YARN-1857:
---

Sorry, my bad, looks like YARN-2400 is checked in. Anyway, it is not related to 
this patch.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162510#comment-14162510
 ] 

Hudson commented on YARN-1857:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6206 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6206/])
YARN-1857. CapacityScheduler headroom doesn't account for other AM's running. 
Contributed by Chen He and Craig Welch (jianhe: rev 
30d56fdbb40d06c4e267d6c314c8c767a7adc6a3)
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/TestLeafQueue.java
* hadoop-yarn-project/CHANGES.txt
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java


 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.7.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160819#comment-14160819
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673170/YARN-1857.5.patch
  against trunk revision ea26cc0.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5279//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161164#comment-14161164
 ] 

Jian He commented on YARN-1857:
---

could you please update the patch on top of YARN-2644 ? comments in the 
meanwhile: 
- update the code comments about the new calculation of headroom 
{code}
/** 
 * Headroom is min((userLimit, queue-max-cap) - consumed)
 */
{code}
- indentation of this line {{Resources.subtract(queueMaxCap, usedResources));}}

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161300#comment-14161300
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5292//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5292//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161373#comment-14161373
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12673238/YARN-1857.6.patch
  against trunk revision 519e5a7.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5296//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5296//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161496#comment-14161496
 ] 

Jian He commented on YARN-1857:
---

I found given that  {{queueUsedResources = userConsumed}}, we can simplify the 
formula to {code} min (userlimit - userConsumed,   queueMaxCap- 
queueUsedResources) {code}, does this make sense ?

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.5.patch, YARN-1857.6.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-10-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14158644#comment-14158644
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12672863/YARN-1857.4.patch
  against trunk revision 7f6ed7f.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5256//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.4.patch, YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-09-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139605#comment-14139605
 ] 

Jian He commented on YARN-1857:
---

thanks [~airbots] and [~cwelch], patch looks good overall, few comments and 
questions:
- Indentation of the last line seems incorrect.
{code}
Resource headroom =
  Resources.min(resourceCalculator, clusterResource,
Resources.subtract(
Resources.min(resourceCalculator, clusterResource, 
userLimit, queueMaxCap), 
userConsumed),
Resources.subtract(queueMaxCap, usedResources));
{code}
- Test case2: could you check app2 headRoom as well
- Test case3: could you check app_1 headRoom as well.
- Could you explain why in test case 4 {{assertEquals(5*GB, 
app_4.getHeadroom().getMemory());}}, app4 still has 5GB headRoom?

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136807#comment-14136807
 ] 

Hadoop QA commented on YARN-1857:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669331/YARN-1857.3.patch
  against trunk revision 0e7d1db.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4984//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4984//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.3.patch, 
 YARN-1857.patch, YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14132401#comment-14132401
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12668487/YARN-1857.2.patch
  against trunk revision a0ad975.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1265 javac 
compiler warnings (more than the trunk's current 1264 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDecommission

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4946//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/4946//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4946//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.2.patch, YARN-1857.patch, 
 YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-08-28 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114418#comment-14114418
 ] 

Jian He commented on YARN-1857:
---

Hi [~airbots],  thanks for working on this, Can you add more comments in the 
test about how the numbers are calculated ? it's not easy to follow. 
And maybe rename LeafQueue a to b, as it is getting queueB. 

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-08-28 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114431#comment-14114431
 ] 

Chen He commented on YARN-1857:
---

Sure, it has been a while since I created this patch for the first time. Let me 
make the updates. 

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-08-25 Thread Craig Welch (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109798#comment-14109798
 ] 

Craig Welch commented on YARN-1857:
---

[~jianhe] [~wangda] could you have a look at this patch?

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.1.patch, YARN-1857.patch, YARN-1857.patch, 
 YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-06-11 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027943#comment-14027943
 ] 

Jonathan Eagles commented on YARN-1857:
---

Bumping the priority since reducer preemption is broken in many cases without 
this fix.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
Priority: Critical
 Attachments: YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-05 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989865#comment-13989865
 ] 

Chen He commented on YARN-1857:
---

This failure is related to YARN-1906.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
 Attachments: YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990304#comment-13990304
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643410/YARN-1857.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestClientRMService
  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3697//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3697//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
 Attachments: YARN-1857.patch, YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988096#comment-13988096
 ] 

Hadoop QA commented on YARN-1857:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12643084/YARN-1857.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3682//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3682//console

This message is automatically generated.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
 Attachments: YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-05-02 Thread Chen He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988124#comment-13988124
 ] 

Chen He commented on YARN-1857:
---

The TestRMRestart successfully passed on my laptop. I think this failure may 
not be related to my patch.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves
Assignee: Chen He
 Attachments: YARN-1857.patch


 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1857) CapacityScheduler headroom doesn't account for other AM's running

2014-03-19 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940984#comment-13940984
 ] 

Vinod Kumar Vavilapalli commented on YARN-1857:
---

This is just one of the items tracked at YARN-1198. Will convert it as a 
sub-task.

 CapacityScheduler headroom doesn't account for other AM's running
 -

 Key: YARN-1857
 URL: https://issues.apache.org/jira/browse/YARN-1857
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.3.0
Reporter: Thomas Graves

 Its possible to get an application to hang forever (or a long time) in a 
 cluster with multiple users.  The reason why is that the headroom sent to the 
 application is based on the user limit but it doesn't account for other 
 Application masters using space in that queue.  So the headroom (user limit - 
 user consumed) can be  0 even though the cluster is 100% full because the 
 other space is being used by application masters from other users.  
 For instance if you have a cluster with 1 queue, user limit is 100%, you have 
 multiple users submitting applications.  One very large application by user 1 
 starts up, runs most of its maps and starts running reducers. other users try 
 to start applications and get their application masters started but not 
 tasks.  The very large application then gets to the point where it has 
 consumed the rest of the cluster resources with all reduces.  But at this 
 point it needs to still finish a few maps.  The headroom being sent to this 
 application is only based on the user limit (which is 100% of the cluster 
 capacity) its using lets say 95% of the cluster for reduces and then other 5% 
 is being used by other users running application masters.  The MRAppMaster 
 thinks it still has 5% so it doesn't know that it should kill a reduce in 
 order to run a map.  
 This can happen in other scenarios also.  Generally in a large cluster with 
 multiple queues this shouldn't cause a hang forever but it could cause the 
 application to take much longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)