[jira] [Created] (YARN-7619) Max AM Resource value in CS UI is different for every user

2017-12-06 Thread Eric Payne (JIRA)
Eric Payne created YARN-7619:


 Summary: Max AM Resource value in CS UI is different for every user
 Key: YARN-7619
 URL: https://issues.apache.org/jira/browse/YARN-7619
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacity scheduler, yarn
Affects Versions: 3.0.0-beta1, 2.9.0, 2.8.2, 3.1.0
Reporter: Eric Payne
Assignee: Eric Payne


YARN-7245 addressed the problem that the {{Max AM Resource}} in the capacity 
scheduler UI used to contain the queue-level AM limit instead of the user-level 
AM limit. It fixed this by using the user-specific AM limit that is calculated 
in {{LeafQueue#activateApplications}}, stored in each user's {{LeafQueue#User}} 
object, and retrieved via {{UserInfo#getResourceUsageInfo}}.

The problem is that this user-specific AM limit depends on the activity of 
other users and other applications in a queue, and it is only calculated and 
updated when a user's application is activated. So, when 
{{CapacitySchedulerPage}} retrieves the user-specific AM limit, it is a stale 
value unless an application was recently activated for a particular user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7619) Max AM Resource value in CS UI is different for every user

2017-12-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7619:
-
Attachment: Max AM Resources is Different for Each User.png

> Max AM Resource value in CS UI is different for every user
> --
>
> Key: YARN-7619
> URL: https://issues.apache.org/jira/browse/YARN-7619
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2, 3.1.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Max AM Resources is Different for Each User.png
>
>
> YARN-7245 addressed the problem that the {{Max AM Resource}} in the capacity 
> scheduler UI used to contain the queue-level AM limit instead of the 
> user-level AM limit. It fixed this by using the user-specific AM limit that 
> is calculated in {{LeafQueue#activateApplications}}, stored in each user's 
> {{LeafQueue#User}} object, and retrieved via 
> {{UserInfo#getResourceUsageInfo}}.
> The problem is that this user-specific AM limit depends on the activity of 
> other users and other applications in a queue, and it is only calculated and 
> updated when a user's application is activated. So, when 
> {{CapacitySchedulerPage}} retrieves the user-specific AM limit, it is a stale 
> value unless an application was recently activated for a particular user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7370) Preemption properties should be refreshable

2017-10-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16218617#comment-16218617
 ] 

Eric Payne commented on YARN-7370:
--

Thanks [~leftnoteasy] for the further design specifications.

bq. YARN-6142, we will take care of all scheduling edit policy refresh.
YARN-6142 is closed, so I'm not sure where the actual work will take place.

As for the rest, it sounds like a good plan.

> Preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Gergely Novák
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6124) Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin -refreshQueues

2017-10-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16216867#comment-16216867
 ] 

Eric Payne commented on YARN-6124:
--

Yes, I agree that these should be part of the scheduler. That makes a lot of 
sense.

> Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin 
> -refreshQueues
> -
>
> Key: YARN-6124
> URL: https://issues.apache.org/jira/browse/YARN-6124
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6124.wip.1.patch
>
>
> Now enabled / disable / update SchedulingEditPolicy config requires restart 
> RM. This is inconvenient when admin wants to make changes to 
> SchedulingEditPolicies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7370) Preemption properties should be refreshable

2017-10-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227276#comment-16227276
 ] 

Eric Payne commented on YARN-7370:
--

[~GergelyNovak], Thanks for the updated patch. Just a couple of things:
- Why were {{DEFAULT_PREEMPTION_MAX_IGNORED_OVER_CAPACITY}} and 
{{DEFAULT_PREEMPTION_NATURAL_TERMINATION_FACTOR}} changed from float to double? 
The capacity scheduler configuration properties are not consistent about the 
usage of float and double, but it looks like the preemption properties are 
using float. If we want to make it consistent or change these to double, I 
would prefer to do it as a separate JIRA.
- Thanks for adding the log documenting the updated properties. Can you please 
add the following properties to the log statement?
-- isIntraQueuePreemptionEnabled
-- selectCandidatesForResevedContainers
-- isQueuePriorityPreemptionEnabled
-- additionalPreemptionBasedOnReservedResource



> Preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Gergely Novák
> Attachments: YARN-7370.001.patch, YARN-7370.002.patch
>
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2017-10-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227528#comment-16227528
 ] 

Eric Payne commented on YARN-7424:
--

In a large, multi-tenant queue with MULP of 1%, after instrumenting intra-queue 
preemption, we have discovered that enabling both inter-queue and intra-queue 
preemption causes an order of magnitude more lost work than enabling only 
inter-queue preemption alone. Even after reducing 
{{intra-queue-preemption.max-allowable-limit}} from 20% (default) to 3%, the 
lost work is still several times more than with just inter-queue alone.

| | *MemSeconds Lost* |
| *Only inter-queue preemption enabled* | {{LostCrossQueueMemSec}} |
| *Both inter- and intra-queue preemption enabled with 20% max-allocaion-limit* 
| {{12.7824 * LostCrossQueueMemSec}} |
| *Both inter- and intra-queue preemption enabled with 3% max-allocaion-limit* 
| {{7.9893 * LostCrossQueueMemSec}} |

| | *Vcoreseconds Lost* |
| *Only inter-queue preemption enabled* | {{LostCrossQueueVSec}} |
| *Both inter- and intra-queue preemption enabled with 20% max-allocaion-limit* 
| {{26.1885 * LostCrossQueueVSec}} |
| *Both inter- and intra-queue preemption enabled with 3% max-allocaion-limit* 
| {{19.2676 * LostCrossQueueVSec}} |

It is expected that turning on intra-queue preemption would increase the number 
of preemptions. However, an order of magnituded more seems excessive. Also, 
reducing {{intra-queue-preemption.max-allowable-limit}} didn't have nearly the 
effect I thought it should.

I think there is an underlying design philosophy that should be addressed.

The current intra-queue preemption design balances the user limit among all of 
the users. This calculation is based on the total queue capacity and the number 
of users in the queue. In a very large queue with a large number of active 
users, the number of users in the queue is constantly changing. Also, if the 
node overcommit feature is enabled, the total size of the queue will change as 
well when the cluster becomes very busy. The result is that preemption must 
constantly happen in order to balance all of the users.

For this reason, we need a configuration property that stops preempting on 
behalf of a user once the user is above the MULP, which is a stable value. As a 
variation, we may want to have a "live zone" of MULP plus some configurable 
value.


> Capacity Scheduler Intra-queue preemption: add property to only preempt up to 
> configured MULP
> -
>
> Key: YARN-7424
> URL: https://issues.apache.org/jira/browse/YARN-7424
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-beta1, 2.8.2
>Reporter: Eric Payne
>
> If the queue's configured minimum user limit percent (MULP) is something 
> small like 1%, all users will max out well over their MULP until 100 users 
> have apps in the queue. Since the intra-queue preemption monitor tries to 
> balance the resource among the users, most of the time in this use case it 
> will be preempting containers on behalf of users that are already over their 
> MULP guarantee.
> This JIRA proposes that a property should be provided so that a queue can be 
> configured to only preempt on behalf of a user until that user has reached 
> its MULP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2017-10-31 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-7424:


Assignee: Eric Payne

> Capacity Scheduler Intra-queue preemption: add property to only preempt up to 
> configured MULP
> -
>
> Key: YARN-7424
> URL: https://issues.apache.org/jira/browse/YARN-7424
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-beta1, 2.8.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> If the queue's configured minimum user limit percent (MULP) is something 
> small like 1%, all users will max out well over their MULP until 100 users 
> have apps in the queue. Since the intra-queue preemption monitor tries to 
> balance the resource among the users, most of the time in this use case it 
> will be preempting containers on behalf of users that are already over their 
> MULP guarantee.
> This JIRA proposes that a property should be provided so that a queue can be 
> configured to only preempt on behalf of a user until that user has reached 
> its MULP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2017-10-31 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7424:
-
Description: 
If the queue's configured minimum user limit percent (MULP) is something small 
like 1%, all users will max out well over their MULP until 100 users have apps 
in the queue. Since the intra-queue preemption monitor tries to balance the 
resource among the users, most of the time in this use case it will be 
preempting containers on behalf of users that are already over their MULP 
guarantee.

This JIRA proposes that a property should be provided so that a queue can be 
configured to only preempt on behalf of a user until that user has reached its 
MULP.


  was:




> Capacity Scheduler Intra-queue preemption: add property to only preempt up to 
> configured MULP
> -
>
> Key: YARN-7424
> URL: https://issues.apache.org/jira/browse/YARN-7424
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-beta1, 2.8.2
>Reporter: Eric Payne
>
> If the queue's configured minimum user limit percent (MULP) is something 
> small like 1%, all users will max out well over their MULP until 100 users 
> have apps in the queue. Since the intra-queue preemption monitor tries to 
> balance the resource among the users, most of the time in this use case it 
> will be preempting containers on behalf of users that are already over their 
> MULP guarantee.
> This JIRA proposes that a property should be provided so that a queue can be 
> configured to only preempt on behalf of a user until that user has reached 
> its MULP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2017-10-31 Thread Eric Payne (JIRA)
Eric Payne created YARN-7424:


 Summary: Capacity Scheduler Intra-queue preemption: add property 
to only preempt up to configured MULP
 Key: YARN-7424
 URL: https://issues.apache.org/jira/browse/YARN-7424
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0-beta1, 2.8.2
Reporter: Eric Payne







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466541#comment-16466541
 ] 

Eric Payne commented on YARN-4606:
--

{code:title=AppSchedulingInfo#updatePendingResources}
if(! hasActiveUsersOfPendingAppsDecremented.get()) {
abstractUsersManager.decrNumActiveUsersOfPendingApps();
hasActiveUsersOfPendingAppsDecremented.set(true);
}
{code}

Does {{hasActiveUsersOfPendingAppsDecremented}} need to be atomic? What is the 
benefit?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8004) Add unit tests for inter queue preemption for dominant resource calculator

2018-04-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456369#comment-16456369
 ] 

Eric Payne commented on YARN-8004:
--

bq. could i backport this to branch-2.9/2.8 as well?
[~sunilg], sure that would be fine. Thanks for the reviews and commits.

> Add unit tests for inter queue preemption for dominant resource calculator
> --
>
> Key: YARN-8004
> URL: https://issues.apache.org/jira/browse/YARN-8004
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Zian Chen
>Priority: Critical
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8004.001.patch, YARN-8004.002.patch, 
> YARN-8004.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-04-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456511#comment-16456511
 ] 

Eric Payne commented on YARN-4781:
--

bq. Sorry for the delay here Eric Payne. I will check and share my comments 
today.
Hi [~sunilg]. Thanks for your reviews and comments. Have you had a chance to 
review the latest patch?

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-07 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466136#comment-16466136
 ] 

Eric Payne commented on YARN-4606:
--

Thanks [~maniraj...@gmail.com] for your consistent and continuing efforts to 
fix this problem.

I am doing an in-depth review, but I would like to address a few things first 
regarding method names and comments. I feel that it is important to be accurate 
in these areas in order to eliminate confusion for those maintaining this code.

- All occurrences of "atleast" should be "at least"
- Comment for {{AbstractUsersManager#getNumActiveUsers}}:
{code:title=AbstractUsersManager#getNumActiveUsers}
-   * Get number of active users i.e. users with applications which have pending
-   * resource requests.
+   * Get number of active users i.e. users with atleast 1 active applications
{code}
For this comment, I would say "Get number of active users i.e. users with at 
least 1 running application and and applications requesting resources"
- I would prefer it if the name of {{ActiveUsersOfPendingApps}} was changed 
everywhere to {{ActiveUsersWithOnlyPendingApps}}. This is kind of a nit, but I 
do feel that the rename would be more descriptive.
- {{AbstractUsersManager#incrNumActiveUsersOfPendingApps}}, 
{{decrNumActiveUsersOfPendingApps}}, and {{getNumActiveUsersOfPendingApps}}
Change description to "number of users with only pending apps"
- {{UsersManager#activateApplication}} and {{deactivateApplication}}
Change "Active users which has atleast 1 pending apps:" to "Active users which 
have at least 1 pending app:"


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8248) Job hangs when queue is specified and that queue has 0 capability of a resource

2018-05-07 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-8248:
-
Component/s: yarn
 fairscheduler

> Job hangs when queue is specified and that queue has 0 capability of a 
> resource
> ---
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot server the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477674#comment-16477674
 ] 

Eric Payne commented on YARN-8292:
--

Thanks [~ssath...@hortonworks.com] and [~leftnoteasy].

[~leftnoteasy], Can you please clarify the following:
{noformat}
// - After preempt the container, the to-obtain should be either > 0
// OR any major resource equals to 0.
...
// * before: <30, 10, 5>, after <20, 10, -10>
{noformat}
In this proposal, should preemption continue after the above? After a container 
is subtracted in the above example, {{resToObtain == <20, 10, -10>}} and the 
next container request is {{<10, 15, 5>}}, would the process continue so we 
would have the following?

{noformat}
// * before: <20, 10, -10>, after: <10, -5, -15>
{noformat}

 

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482689#comment-16482689
 ] 

Eric Payne commented on YARN-8292:
--

Thanks [~leftnoteasy] for your work on this issue.

- I don't think this is necessary.
{code:title=AbstractPreemptableResourceCalculator#computeFixpointAllocation}
  Resource dupUnassignedForTheRound = Resources.clone(unassigned);
{code}


- I'm concerned about checking for {{any resource <= 0}} before preempting for 
intra-queue preemption. When extended resources are used, won't this prevent 
any preemption in a queue where none of the apps used the extended resource?
{code:title=CapacitySchedulerPreemptionUtils#tryPreemptContainerAndDeductResToObtain}
  if (conservativeDRF) {
doPreempt = !Resources.isAnyMajorResourceZeroOrNegative(rc,
toObtainByPartition);
  } else{
{code}
For example, if gpu is the extended resource, but no apps are currently using 
gpu in the queue, no intra-queue preemption will take place.


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482841#comment-16482841
 ] 

Eric Payne commented on YARN-8179:
--

[~kyungwan nam], I am really sorry for the long delay, and I'm also very sorry 
that I do have one more request, even though I had previously approved the 
latest patch.

{code:title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues}
   if (Resources.greaterThan(rc, clusterResource, resToObtain,
 Resource.newInstance(0, 0)))
{code}

Can we please change the {{newInstance}} call to {{Resources.none()}}? This 
will accommodate extensible resources.

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482855#comment-16482855
 ] 

Eric Payne commented on YARN-8179:
--

{quote}Can we please change the {{newInstance}} call to {{Resources.none()}}? 
This will accommodate extensible resources.
{quote}
Nope. Sorry. Forget about that previous comment. It looks like 
\{{Resource.newInstance(0, 0)}} already covers extensible resources.

 

I will commit this today or tomorrow.

 

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482841#comment-16482841
 ] 

Eric Payne edited comment on YARN-8179 at 5/21/18 6:22 PM:
---

[~kyungwan nam]-, I am really sorry for the long delay, and I'm also very sorry 
that I do have one more request, even though I had previously approved the 
latest patch.-
{code:java|title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues}
   if (Resources.greaterThan(rc, clusterResource, resToObtain,
 Resource.newInstance(0, 0)))
{code}
-Can we please change the {{newInstance}} call to {{Resources.none()}}? This 
will accommodate extensible resources.-


was (Author: eepayne):
[~kyungwan nam], I am really sorry for the long delay, and I'm also very sorry 
that I do have one more request, even though I had previously approved the 
latest patch.

{code:title=PreemptableResourceCalculator#calculateResToObtainByPartitionForLeafQueues}
   if (Resources.greaterThan(rc, clusterResource, resToObtain,
 Resource.newInstance(0, 0)))
{code}

Can we please change the {{newInstance}} call to {{Resources.none()}}? This 
will accommodate extensible resources.

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483046#comment-16483046
 ] 

Eric Payne commented on YARN-8179:
--

Committed to trunk, branch-3.1, and branch-3.0. Thanks for the good work, 
[~kyungwan nam]

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.3
>
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-22 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484257#comment-16484257
 ] 

Eric Payne commented on YARN-8292:
--

{quote}Actually this is required after the change.
{quote}
Yes, I see now.
{quote}TestPreemptionForQueueWithPriorities
{quote}
{{TestPreemptionForQueueWithPriorities}} passes for me in my local environment.
{quote}doPreempt = Resources.lessThan(rc, clusterResource,
 Resources
 .componentwiseMin(toObtainAfterPreemption, Resources.none()),
 Resources.componentwiseMin(toObtainByPartition, Resources.none()));
{quote}
I don't think we want the above code to {{componentwiseMin}} the {{toObtain}} 
values with 0, since that will set _all_ positive resource entities to 0.
{quote}Can we address this in a separate JIRA if we cannot come with some 
simple solution?
{quote}
In my tests, the current implementation of preemption does not seem to work 
anyway when extensible resources are enabled, so this seems to be a larger 
problem. You are right that it should be its own JIRA.

I give my +1 here. [~jlowe] / [~sunilg], do you have additional comments?

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487902#comment-16487902
 ] 

Eric Payne commented on YARN-4781:
--

bq.  FairOrdering policy could be used with weights?
[~sunilg], the fair ordering preemption will generally select the 
smaller-weigted users first even when those containers are older. It's a 
hierarchy of priority ordering, though, and it does still try to be "fair," so 
you could have a situation where the youngest containers are selected even 
though they are owned by a more heavily-weighted user.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-25 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16490886#comment-16490886
 ] 

Eric Payne commented on YARN-8292:
--

+1. Unless there are addition comments, I[ll commit today.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, 
> YARN-8292.009.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489070#comment-16489070
 ] 

Eric Payne commented on YARN-8292:
--

[~leftnoteasy], thanks for the new patch.
- In the following description, it should say "returns true if any resource is 
greater than zero."
{code:title=ResourceCalculator#isAnyMajorResourceAboveZero}
   * @return returns true if any resource is zero.
   */
  public abstract boolean isAnyMajorResourceAboveZero(Resource resource);
{code}

I'm sorry for asking for a comment change at this stage, but I feel that it's 
important to have the correct description in new methods.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-17 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479599#comment-16479599
 ] 

Eric Payne commented on YARN-8292:
--

Thanks for the updated patch, [~leftnoteasy].

The changes seem to be over-preempting. In the unit test, the request is for 1 
of each resource type, but it is preempting 2 containers.
{code:java|title=test3ResourceTypesInterQueuePreemption}
...
String queuesConfig =
// guaranteed,max,used,pending
"root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
"-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
"-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
"-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
...
verify(mDisp, times(1)).handle(
argThat(new 
TestProportionalCapacityPreemptionPolicy.IsPreemptionRequestFor(getAppAttemptId(1;
{code}
NOTE: if I add the following check to verify that a container was not preempted 
from app2, it fails:
{code:java}
verify(mDisp, times(0)).handle(
argThat(new 
TestProportionalCapacityPreemptionPolicy.IsPreemptionRequestFor(getAppAttemptId(2;
{code}
Since the resource request is {{<1,1,1>}}, I would expect only 1 container to 
be preempted. However, in the unit test logs, I see the following:
{code:java}
2018-05-17 19:01:00,372 DEBUG [main] 
capacity.ProportionalCapacityPreemptionPolicy 
(ProportionalCapacityPreemptionPolicy.java:preemptOrkillSelectedContainerAfterWait(314))
 - Starting to preempt containers for selectedCandidates and size:2
{code}
 

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478157#comment-16478157
 ] 

Eric Payne commented on YARN-8292:
--

bq. you can check 
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.TestProportionalCapacityPreemptionPolicyInterQueueWithDRF#test3ResourceTypesInterQueuePremption
 test for details.
This test is not actually enabling DRF. You need to add the 5th argument to 
{{buildEnv()}}:
{code}
-buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig);
+buildEnv(labelsConfig, nodesConfig, queuesConfig, appsConfig, true);
{code}

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
> //   guaranteed,  max,used,   pending
> "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478184#comment-16478184
 ] 

Eric Payne commented on YARN-4781:
--

Hi [~sunilg]. Will you have an opportunity to review the latest patch?

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8292) Fix the dominant resource preemption cannot happen when some of the resource vector becomes negative

2018-05-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478000#comment-16478000
 ] 

Eric Payne commented on YARN-8292:
--

{quote}
||Queue||Guaranteed||Used ||Pending||
|A|<20, 10>|<20,30>| |
|B|<20, 10>|0|0|
|C|<20, 10>|0|<20, 30>|
Under current logic, A's calculated to-preempt (how much resource other queue 
can preempt) will be <0, 20>. The preemption will not happen.
{quote}
I want to challenge the original example. The above does cause preemption. I 
have tested this scenario, and it does preempt.

In my tests, the first resource is memory and the second is vcores. I think the 
reason is that the dominant resource calculator will determine that vcores is a 
higher percentage of the available resources than memory, so vcores is dominant.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> 
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Reporter: Sumana Sathish
>Assignee: Wangda Tan
>Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch
>
>
>  
>  This is an example of the problem: (Same if we have more than 2 resources)
>   
> Let's say we have 3 queues A/B/C. All containers with equal size <2,3>
>  
> ||Queue||Guaranteed||Used ||Pending||
> |A|<20, 10>|<20,30>| |
> |B|<20, 10>|0|0|
> |C|<20, 10>|0|<20, 30>|
> | | | | |
>  
> Under current logic, A's calculated to-preempt (how much resource other queue 
> can preempt) will be <0, 20>. The preemption will not happen. However, under 
> the context of DRC, queue A is using more resource than guaranteed, so queue 
> C will be starved



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-06 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503282#comment-16503282
 ] 

Eric Payne commented on YARN-8379:
--

Thanks [~Zian Chen] for working on this and providing an initial patch. The 
patch does not apply cleanly, so can you please provide an update?

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-07 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504630#comment-16504630
 ] 

Eric Payne commented on YARN-8379:
--

{quote}
Why is this a preemption only concept? To avoid unnecessary thrash between the 
allocation doing one thing and the preemption doing another, we should also 
have a corresponding queue ordering-policy, right?
{quote}
This is only related to preemption because the capacity scheduler already 
balances if resources become available. However, currently, if preemption is 
enabled on all queues, preemption will stop freeing resources once all pending 
queues are over their queue capacity.

The example in this JIRA's description outlines the current behavior. In that 
example, if resources free up naturally from queue_b, the capacity scheduler 
will assign them to queue_a. However, the preemption monitor will not preempt 
them because queue_a is at its 30% queue capacity.

In 2.8 and prior releases, the preemption monitor does balance. As pointed out 
above, the balancing behavior was changed as part of YARN-5864

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-12 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509992#comment-16509992
 ] 

Eric Payne commented on YARN-8379:
--

[~Zian Chen], I will review the patch but it may take a couple of days.

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-13 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511469#comment-16511469
 ] 

Eric Payne commented on YARN-8379:
--

[~Zian Chen], thank you for the work on this issue and for the new patch. I am 
still working through the patch, but I have the following concerns so far.
- Cross-queue preemption does not work when this patch is applied.
My test environment simulates a 6-node pseudo YARN cluster. I use the same 
queue configs with and without this patch. With this patch, no cross-queue 
preemption happens at all.
- Please address the failed unit tests, failed findbugs, and failed 
shadedclient warnings. I think they are related to this patch.
- {{ProportionalCapacityPreemptionPolicy#updateConfigIfNeeded}}
This code adds {{FifoCandidatesSelector}} to candidatesSelectionPolicies twice, 
which will cause it to be called twice since candidatesSelectionPolicies is an 
{{ArrayList}}. If I understand correctly, the reason for this is so that the 
first time {{FifoCandidatesSelector#selectCandidates}} is called, it will 
preempt up to the requesting queue's guarantee, and the second time it will not 
preempt until the requsting queue is above its guarantee AND the 
{{allowQueuesBalanceAfterAllQueuesSatisfied}} variable is set.
Why can't {{FifoCandidatesSelector}} just be added once and do all the 
processing it needs to based on whether or not 
{{allowQueuesBalanceAfterAllQueuesSatisfied}} is set?
- {{FifoCandidatesSelector#selectCandidates}}
If the skip logic is necessary (depending on answer to my first question), I 
think the return needs to be moved up above the previous curly brace (}). The 
way it is now, it returns whether containers is empty or not empty.
{code:title=FifoCandidatesSelector#selectCandidates}
for (Set containers : selectedCandidates.values()) {
  if (!containers.isEmpty()) {
if (LOG.isDebugEnabled()) {
  LOG.debug(...);
}
  }
  return selectedCandidates;
}
{code}
- {{FifoCandidatesSelector#selectCandidates}}
For the debug log statement, I would not use the word "Fairness" because the 
word "Fair" has a lot of different meanings when it comes to schedulers and 
policies. To make it more grammatically correct (and to remove the confusion 
surrounding "fairness"), I would say, "The preemption-to-balance feature is on. 
Some containers were chosen for preemption by previous selectors. Skipping 
container selection for FifoCandidatesSelector");
- General. For the same reason as above, I think we can just remove the workd 
"Fair" or "Fairness" from all method and variable names and the meaning would 
remain.
- {{AbstractPreemptableResourceCalculator#getIdealPctOfGuaranteed}}
bq. Should we allow queues continue grow after queues satisfied?
This could be misinterpreted to mean that the capacity scheduler doesn't allow 
a queue to grow over its capacity guarantee. It may be better to modify this to 
make it clear that we are talking about preemption: "Should resources be 
preempted from an over-served queue when the requesting queues are all at or 
over their guarantees?"


> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-13 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-8379:
-
Attachment: ericpayne.confs.tgz

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-13 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511694#comment-16511694
 ] 

Eric Payne commented on YARN-8379:
--

[~Zian Chen], I attached the confs I used to create my pseudo cluster. I was 
using patch 003.
{quote}3. The reason we add FifoCandidatesSelector to 
candidatesSelectionPolicies twice is that we want to make conservative 
preemption when we do the balance.
{quote}
I don't see why this is necessary. In 2.8 (and earlier 3.x releases prior to 
YARN-5864), the balancing was done all at once inside the 
{{FifoCandidatesSelector}} by properly adjusting the ideal assigned values per 
queue and the values of offered resources to each queue. Why can't we adjust 
these values to either 1) keep the same behavior or 2) balance queues, 
depending on the setting of the new property 
({{fairness-balance-queue-after-satisfied.enabled}}).
{quote}4. The reason for this code is explained in item 3.
{quote}
My question here is why is {{selectedCandidates}} always returned after the 
first time through the for loop? If this was the intention, a for loop is not 
necessary. It looks like the intention was to only return if containers exist 
in {{selectedCandidates}} (the for loop) AND {{if (!containers.isEmpty())}}. 
Did you want the return to be inside of the {{if (!containers.isEmpty())}}?

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-8425) Yarn container getting killed due to running beyond physical memory limits

2018-06-14 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-8425.
--
Resolution: Not A Bug

> Yarn container getting killed due to running beyond physical memory limits
> --
>
> Key: YARN-8425
> URL: https://issues.apache.org/jira/browse/YARN-8425
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: applications, container-queuing, yarn
>Affects Versions: 2.7.6
>Reporter: Tapas Sen
>Priority: Major
> Attachments: yarn_configuration_1.PNG, yarn_configuration_2.PNG, 
> yarn_configuration_3.PNG
>
>
> Hi,
> Getting these error.
>  
> 2018-06-12 17:59:07,193 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1527758146858_45040_m_08_3: Container 
> [pid=15498,containerID=container_e60_1527758146858_45040_01_41] is 
> running beyond physical memory limits. Current usage: 8.1 GB of 8 GB physical 
> memory used; 12.2 GB of 16.8 GB virtual memory used. Killing container.
>  
> Yarn resource configuration will in attachment. 
>  
>  Any lead would be appreciated.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-19 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517557#comment-16517557
 ] 

Eric Payne commented on YARN-4606:
--

I put this Jira into PATCH AVAILABLE mode so that it would kick the pre-commit 
build.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8377) Javadoc build failed in hadoop-yarn-server-nodemanager

2018-05-30 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495410#comment-16495410
 ] 

Eric Payne commented on YARN-8377:
--

Thanks a lot, [~tasanuma0829] for tracking this all down and providing the 
fixes.

+1. I will commit shortly

> Javadoc build failed in hadoop-yarn-server-nodemanager
> --
>
> Key: YARN-8377
> URL: https://issues.apache.org/jira/browse/YARN-8377
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: build, docs
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Critical
> Attachments: YARN-8377.1.patch
>
>
> This is the same cause as YARN-8369.
> {code}
> [ERROR] 
> /hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/container/SlidingWindowRetryPolicy.java:88:
>  error: bad use of '>'
> [ERROR]* When failuresValidityInterval is > 0, it also removes time 
> entries from
> [ERROR]   ^
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-30 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495656#comment-16495656
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], do you have a status on updating this patch? Do you 
need any help from the community?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-05-31 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16497232#comment-16497232
 ] 

Eric Payne commented on YARN-4606:
--

Thanks [~maniraj...@gmail.com] for the updated patch. Here are my comments so 
far:
- I am concerned that this implementation adds code that is specific to 
{{CapacityScheduler}} inside of {{AppSchedulingInfo}}. I feel that this sets a 
precedent that makes it hard to maintain a clean separation between abstract 
and specific scheduler code. Also, this only fixes the problem for the 
{{CapacityScheduler}}. The previous fix in patch 001 was relying on metrics and 
I realize that is risky, but it was a more generic fix. I would be interested 
to hear thoughts from [~sunilg] and [~leftnoteasy].
- Only the {{CapacityScheduler}} has been changed to handle the new 
{{AppAMAttemptsFailedSchedulerEvent}}. Should the other schedulers handle that 
as well? If they don't handle it, don't we risk them getting unhandled event 
exceptions?
- In all places where new {{LOG.debug(...)}} statementes are added, please also 
enclose them with {{if (LOG.isDebugEnabled())}}. This is for the sake of 
performance, so that the strings are not built, passed to {{LOG.debug()}}, and 
then thrown away if log debugging is not enabled.


> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-06-01 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498517#comment-16498517
 ] 

Eric Payne commented on YARN-4781:
--

bq. Thank you. I ll commit this to branch-2 shortly.
Thanks [~sunilg]. I would like to keep branch-2, branch-2.9 and branch-2.8 in 
sync with the 3.x branches as much as possible.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.0.3
>
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.branch-2.patch, 
> YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-29 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4781:
-
Attachment: YARN-4781.005.branch-2.patch

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.branch-2.patch, 
> YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-29 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493761#comment-16493761
 ] 

Eric Payne commented on YARN-4781:
--

Thanks a lot [~sunilg]! I attached patch {{YARN-4781.005.branch-2.patch}}, 
which should apply cleanly to branch-2, branch-2.9 and branch-2.8.

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.branch-2.patch, 
> YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-05-31 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496715#comment-16496715
 ] 

Eric Payne commented on YARN-8379:
--

Thanks [~leftnoteasy] for bringing this up. Yes, our use case would benefit 
from this feature. We are currently running 2.8, which does the balancing, so 
this would help us in moving to 3.x in the future.

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Wangda Tan
>Priority: Major
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_b since 
> queue_b has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8444) NodeResourceMonitor crashes on bad swapFree value

2018-06-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520597#comment-16520597
 ] 

Eric Payne commented on YARN-8444:
--

Thanks [~Jim_Brennan] for the work on this JIRA.

+1. I will commit soon

> NodeResourceMonitor crashes on bad swapFree value
> -
>
> Key: YARN-8444
> URL: https://issues.apache.org/jira/browse/YARN-8444
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.8.3, 3.0.2
>Reporter: Jim Brennan
>Assignee: Jim Brennan
>Priority: Major
> Attachments: YARN-8444.001.patch
>
>
> Saw this on a node that was running out of memory. Can't have 
> NodeResourceMonitor exiting. System was above 99% memory used at the time, so 
> this is not a common occurrence, but we should fix since this is a critical 
> monitor to the health of the node.
>  
> {noformat}
> 2018-06-04 14:28:08,539 [Container Monitor] DEBUG 
> ContainersMonitorImpl.audit: Memory usage of ProcessTree 110564 for 
> container-id container_e24_1526662705797_129647_01_004791: 2.1 GB of 3.5 GB 
> physical memory used; 5.0 GB of 7.3 GB virtual memory used
> 2018-06-04 14:28:10,622 [Node Resource Monitor] ERROR 
> yarn.YarnUncaughtExceptionHandler: Thread Thread[Node Resource 
> Monitor,5,main] threw an Exception.
> java.lang.NumberFormatException: For input string: "18446744073709551596"
>  at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>  at java.lang.Long.parseLong(Long.java:592)
>  at java.lang.Long.parseLong(Long.java:631)
>  at 
> org.apache.hadoop.util.SysInfoLinux.readProcMemInfoFile(SysInfoLinux.java:257)
>  at 
> org.apache.hadoop.util.SysInfoLinux.getAvailablePhysicalMemorySize(SysInfoLinux.java:591)
>  at 
> org.apache.hadoop.util.SysInfoLinux.getAvailableVirtualMemorySize(SysInfoLinux.java:601)
>  at 
> org.apache.hadoop.yarn.util.ResourceCalculatorPlugin.getAvailableVirtualMemorySize(ResourceCalculatorPlugin.java:74)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.NodeResourceMonitorImpl$MonitoringThread.run(NodeResourceMonitorImpl.java:193)
> 2018-06-04 14:28:30,747 
> [org.apache.hadoop.util.JvmPauseMonitor$Monitor@226eba67] INFO 
> util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of 
> approximately 9330ms
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-22 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4606:
-
Attachment: YARN-4606.POC.3.patch

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-22 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520822#comment-16520822
 ] 

Eric Payne commented on YARN-4606:
--

[~maniraj...@gmail.com], we can fix the queue application starvation problem by 
making most of the changes in the scheduler-specific users managers. For 
{{CapacityScheduler}}, all the changes can be done in the {{UsersManager}} 
class. For the other schedulers (FIfo, Fair, etc.), I think there needs to be 
some amount of changes in the scheduler infrastructure classes to support 
retrieving iformation such as number of pending and active apps per user, 
amount of queue's AM limit resources, amount of a user's used AM resources, 
etc. But I think that most of the changes can be done in {{ActiveUsersManager}} 
for other schedulers as well.

I am attaching a POC patch that only modifies {{UsersManager}}. The 
{{UsersManager}} already keeps track of all users in the queue. Each user 
object keeps the number of active apps and the number of pending apps. here is 
the sequence of events plus proposed change:
 - When an application is submitted, the user object's pending apps count is 
incremented
 - If limits are not exceeded, {{LeafQueue}} activates the app
 -- {{Leafqueue#activateApplications}} already checks whether or not activation 
of an application will go over the queue's AM limit.
 -- If activating the application will not go over the queue's AM limit, 
{{Leafqueue#activateApplications}} will increment the user object's active app 
count and decrement the pending app count.
 -- However, if activating the application will go over the queue's AM limit, 
the user's pending app count remains the same.
 - The change made in {{YARN-4606.POC.3.patch}} is that 
{{UsersManager#activateApplication}} will check whether or not the user object 
has any active apps. If not, it will not continue (thus not putting the user in 
the {{activeUsers}} list).

I have not yet analyzed the problem you pointed out above regarding moving apps 
to different queues.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-06-25 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522811#comment-16522811
 ] 

Eric Payne commented on YARN-4606:
--

{quote}At the same time, this patch is less "strict" in terms of updates 
(specifically on when? ) compared to approaches discussed in our earlier 
patches.
{quote}
The value for number of active apps per user used to be calculated every time 
through the scheduler loop, which was a performance problem. In order to avoid 
this heavy calculation, YARN-5889 created the {{UsersManager}}. Instead of 
doing the calculation every time through the loop, YARN-5889 only recalculates 
these values when events occurs that could affect this count like new 
application, app completes, new container request, completed container, etc. In 
the latest POC patch, {{activeUsersWithOnlyPendingApps}} is part of this flow, 
so it will always be updated whenever anything happens that could affect this 
value.
{quote}Also, based on our earlier discussions, We need to depend on 
activeUsers.get() only in certain context and sum of activeUsers.get() and 
activeUsersWithOnlyPendingApps.get() in some other places. But POC patch always 
depends on later value. I didn't understand this part.
{quote}
I think you are referencing this comment from above:
{quote}My understanding is that user limit would use activeUsers and things 
like max AM limit per user, we'd use activeUsers + activeUsersOfPendingApps
{quote}
{{LeafQueue#activateApplications}} is the only thing that calls 
{{UsersManager#getNumActiveUsers}}, which it uses to calculate the 
user-specific AM limit, so it's the one that needs both activeusers + 
{{activeUsersWithOnlyPendingApps}}.
 {{UsersManager#computeUserLimit}} uses only activeUsers to calculate the 
headroom and user limit, which is what we decided in the comment above. Is that 
your understanding of these comments?

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Manikandan R
>Priority: Critical
> Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-26 Thread Eric Payne (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523691#comment-16523691
 ] 

Eric Payne commented on YARN-8379:
--

[~Zian Chen] , I probably won't be able to get to this for a couple of weeks.

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch, YARN-8379.004.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-01 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4781:
-
Attachment: YARN-4781.005.patch

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459919#comment-16459919
 ] 

Eric Payne commented on YARN-4781:
--

I missed one of the two places that needed a check for {{USERLIMIT_FIRST}, so I 
uploaded a new patch 005.
{quote}
If more apps are from same user, older one's will be preempted in a fair manner 
(which means to satisfy a younger app, older apps may give some container 
each). Am I correct?
{quote}
Yes. Basically, youngest containers are preempted from largest apps of the most 
over-served users

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4781) Support intra-queue preemption for fairness ordering policy.

2018-05-01 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459919#comment-16459919
 ] 

Eric Payne edited comment on YARN-4781 at 5/1/18 6:04 PM:
--

I missed one of the two places that needed a check for {{USERLIMIT_FIRST}}, so 
I uploaded a new patch 005.
{quote}If more apps are from same user, older one's will be preempted in a fair 
manner (which means to satisfy a younger app, older apps may give some 
container each). Am I correct?
{quote}
Yes. Basically, youngest containers are preempted from largest apps of the most 
over-served users


was (Author: eepayne):
I missed one of the two places that needed a check for {{USERLIMIT_FIRST}, so I 
uploaded a new patch 005.
{quote}
If more apps are from same user, older one's will be preempted in a fair manner 
(which means to satisfy a younger app, older apps may give some container 
each). Am I correct?
{quote}
Yes. Basically, youngest containers are preempted from largest apps of the most 
over-served users

> Support intra-queue preemption for fairness ordering policy.
> 
>
> Key: YARN-4781
> URL: https://issues.apache.org/jira/browse/YARN-4781
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: scheduler
>Reporter: Wangda Tan
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-4781.001.patch, YARN-4781.002.patch, 
> YARN-4781.003.patch, YARN-4781.004.patch, YARN-4781.005.patch
>
>
> We introduced fairness queue policy since YARN-3319, which will let large 
> applications make progresses and not starve small applications. However, if a 
> large application takes the queue’s resources, and containers of the large 
> app has long lifespan, small applications could still wait for resources for 
> long time and SLAs cannot be guaranteed.
> Instead of wait for application release resources on their own, we need to 
> preempt resources of queue with fairness policy enabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-04-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458804#comment-16458804
 ] 

Eric Payne commented on YARN-8179:
--

[~kyungwan nam], thanks for the new patch.

+1

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8179) Preemption does not happen due to natural_termination_factor when DRF is used

2018-05-03 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463041#comment-16463041
 ] 

Eric Payne commented on YARN-8179:
--

[~sunilg], if there is no objection, I'll commit this tomorrow.

> Preemption does not happen due to natural_termination_factor when DRF is used
> -
>
> Key: YARN-8179
> URL: https://issues.apache.org/jira/browse/YARN-8179
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-8179.001.patch, YARN-8179.002.patch, 
> YARN-8179.003.patch
>
>
> cluster
> * DominantResourceCalculator
> * QueueA : 50 (capacity) ~ 100 (max capacity)
> * QueueB : 50 (capacity) ~ 50 (max capacity)
> all of resources have been allocated to QueueA. (all Vcores are allocated to 
> QueueA)
> if App1 is submitted to QueueB, over-utilized QueueA should be preempted.
> but, I’ve met the problem, which preemption does not happen. it caused that 
> App1 AM can not allocated.
> when App1 is submitted, pending resources for asking App1 AM would be 
> 
> so, Vcores which need to be preempted from QueueB should be 1.
> but, it can be 0 due to natural_termination_factor (default is 0.2)
> we should guarantee that resources not to be 0 even though applying 
> natural_termination_factor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7370) Intra-queue preemption properties should be refreshable

2017-10-20 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16212646#comment-16212646
 ] 

Eric Payne commented on YARN-7370:
--

[~GergelyNovak], thank you for your interest. Please go ahead and take this 
JIRA.
bq. 2) Do you mean to add a new rmadmin command like -refreshSchedulingMonitors 
or make this part of -refreshQueues?
My opinion is to include these as part of the {{-refreshQueues}} option. The 
queue-specific disable preemption option is refreshable under 
{{-refreshQueues}}, so I think it makes sense to refresh the others in the same 
way.

> Intra-queue preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6124) Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin -refreshQueues

2017-10-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220844#comment-16220844
 ] 

Eric Payne commented on YARN-6124:
--

Thanks [~leftnoteasy]. The proof of concept looks good, but in this version the 
{{ProportionalCapacityPreemptionPolicy}} is NPE-ing during {{init}} because 
{{scheduler.getConfiguration()}} is returning null.

> Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin 
> -refreshQueues
> -
>
> Key: YARN-6124
> URL: https://issues.apache.org/jira/browse/YARN-6124
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6124.wip.1.patch, YARN-6124.wip.2.patch
>
>
> Now enabled / disable / update SchedulingEditPolicy config requires restart 
> RM. This is inconvenient when admin wants to make changes to 
> SchedulingEditPolicies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6124) Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin -refreshQueues

2017-10-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221064#comment-16221064
 ] 

Eric Payne commented on YARN-6124:
--

Thanks [~leftnoteasy]. I will document my findings and you can work on it when 
you get to it. YARN-7370 doesn't depend on this JIRA, does it?

I got it to move past the NPE, but the changes I made may not be the best (it 
may have other side effects):
{code}
   public void serviceInit(Configuration conf) throws Exception {
 Configuration configuration = new Configuration(conf);
-super.serviceInit(conf);
 initScheduler(configuration);
+super.serviceInit(conf);
   }
{code}
Also, a quick test didn't seem to work. I started the RM with 
{{yarn.resourcemanager.scheduler.monitor.enable}} set to {{true}}, changed it 
to false, and then did {{-refreshQueues}}. It's going through the 
{{updateSchedulingMonitors}} code but it doesn't change the value.

> Make SchedulingEditPolicy can be enabled / disabled / updated with RMAdmin 
> -refreshQueues
> -
>
> Key: YARN-6124
> URL: https://issues.apache.org/jira/browse/YARN-6124
> Project: Hadoop YARN
>  Issue Type: Task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6124.wip.1.patch, YARN-6124.wip.2.patch
>
>
> Now enabled / disable / update SchedulingEditPolicy config requires restart 
> RM. This is inconvenient when admin wants to make changes to 
> SchedulingEditPolicies.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4163) Audit getQueueInfo and getApplications calls

2017-10-23 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215426#comment-16215426
 ] 

Eric Payne commented on YARN-4163:
--

Thanks [~jlowe] and [~lichangleo]. I will commit this to trunk, branch-3.0, 
branch-2, and branch-2.8.

> Audit getQueueInfo and getApplications calls
> 
>
> Key: YARN-4163
> URL: https://issues.apache.org/jira/browse/YARN-4163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4163.004.patch, YARN-4163.005.patch, 
> YARN-4163.006.branch-2.8.patch, YARN-4163.006.patch, 
> YARN-4163.007.branch-2.8.patch, YARN-4163.007.patch, YARN-4163.2.patch, 
> YARN-4163.2.patch, YARN-4163.3.patch, YARN-4163.patch
>
>
> getQueueInfo and getApplications seem to sometimes cause spike of load but 
> not able to confirm due to they are not audit logged. This patch propose to 
> add them to audit log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-4163) Audit getQueueInfo and getApplications calls

2017-10-23 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-4163:
-
Attachment: YARN-4163.007.branch-2.8.patch

Attach branch-2.8 specific patch

> Audit getQueueInfo and getApplications calls
> 
>
> Key: YARN-4163
> URL: https://issues.apache.org/jira/browse/YARN-4163
> Project: Hadoop YARN
>  Issue Type: Improvement
>Reporter: Chang Li
>Assignee: Chang Li
> Attachments: YARN-4163.004.patch, YARN-4163.005.patch, 
> YARN-4163.006.branch-2.8.patch, YARN-4163.006.patch, 
> YARN-4163.007.branch-2.8.patch, YARN-4163.007.patch, YARN-4163.2.patch, 
> YARN-4163.2.patch, YARN-4163.3.patch, YARN-4163.patch
>
>
> getQueueInfo and getApplications seem to sometimes cause spike of load but 
> not able to confirm due to they are not audit logged. This patch propose to 
> add them to audit log



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7370) Preemption properties should be refreshable

2017-10-27 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16222830#comment-16222830
 ] 

Eric Payne commented on YARN-7370:
--

Thanks [~GergelyNovak] for the work on this patch. I just have a couple of 
small issues with the patch and one suggestion.

- {{ProportionalCapacityPreemptionPolicy}} has an unused import of 
{{YarnConfiguration}}
- In {{ProportionalCapacityPreemptionPolicy#updateConfigIfNeeded}}, can we 
switch the names of the local {{csConfig}} variable and the global class 
instance variable {{config}}? My opinion is that a class instance variable 
should have the more descriptive name.
- It would be nice if {{updateConfigIfNeeded}} would LOG the values of all of 
the properties so that we have a record in the RM syslog whenever the values 
are refreshed.

> Preemption properties should be refreshable
> ---
>
> Key: YARN-7370
> URL: https://issues.apache.org/jira/browse/YARN-7370
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.8.0, 3.0.0-alpha3
>Reporter: Eric Payne
>Assignee: Gergely Novák
> Attachments: YARN-7370.001.patch
>
>
> At least the properties for {{max-allowable-limit}} and {{minimum-threshold}} 
> should be refreshable. It would also be nice to make 
> {{intra-queue-preemption.enabled}} and {{preemption-order-policy}} 
> refreshable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7619) Max AM Resource value in CS UI is different for every user

2018-01-04 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16312057#comment-16312057
 ] 

Eric Payne commented on YARN-7619:
--

Hi [~sunilg]. Thanks again for the reviews and suggestions. FYI, These patches 
still apply to their respective branches:
- YARN-7619.005.patch:
-- applies only to 3.1
- YARN-7619.005.branch-3.0.patch:
-- applies to branch-3.0 and cherry-picks cleanly to branch-2 and branch-2.9
- YARN-7619.005.branch-2.8.patch:
-- applies to branch-2.8

{quote}
+1 to latest patch. Thanks Eric Payne
I could commit later tomorrow if no objections.
{quote}

Will you get a chance to commit this? Thanks.

> Max AM Resource value in CS UI is different for every user
> --
>
> Key: YARN-7619
> URL: https://issues.apache.org/jira/browse/YARN-7619
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2, 3.1.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: Max AM Resources is Different for Each User.png, 
> YARN-7619.001.patch, YARN-7619.002.patch, YARN-7619.003.patch, 
> YARN-7619.004.branch-2.8.patch, YARN-7619.004.branch-3.0.patch, 
> YARN-7619.004.patch, YARN-7619.005.branch-2.8.patch, 
> YARN-7619.005.branch-3.0.patch, YARN-7619.005.patch
>
>
> YARN-7245 addressed the problem that the {{Max AM Resource}} in the capacity 
> scheduler UI used to contain the queue-level AM limit instead of the 
> user-level AM limit. It fixed this by using the user-specific AM limit that 
> is calculated in {{LeafQueue#activateApplications}}, stored in each user's 
> {{LeafQueue#User}} object, and retrieved via 
> {{UserInfo#getResourceUsageInfo}}.
> The problem is that this user-specific AM limit depends on the activity of 
> other users and other applications in a queue, and it is only calculated and 
> updated when a user's application is activated. So, when 
> {{CapacitySchedulerPage}} retrieves the user-specific AM limit, it is a stale 
> value unless an application was recently activated for a particular user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7619) Max AM Resource value in Capacity Scheduler UI has to be refreshed for every user

2018-01-05 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16313197#comment-16313197
 ] 

Eric Payne commented on YARN-7619:
--

Thanks very much [~sunilg].

> Max AM Resource value in Capacity Scheduler UI has to be refreshed for every 
> user
> -
>
> Key: YARN-7619
> URL: https://issues.apache.org/jira/browse/YARN-7619
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, yarn
>Affects Versions: 2.9.0, 3.0.0-beta1, 2.8.2, 3.1.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4
>
> Attachments: Max AM Resources is Different for Each User.png, 
> YARN-7619.001.patch, YARN-7619.002.patch, YARN-7619.003.patch, 
> YARN-7619.004.branch-2.8.patch, YARN-7619.004.branch-3.0.patch, 
> YARN-7619.004.patch, YARN-7619.005.branch-2.8.patch, 
> YARN-7619.005.branch-3.0.patch, YARN-7619.005.patch
>
>
> YARN-7245 addressed the problem that the {{Max AM Resource}} in the capacity 
> scheduler UI used to contain the queue-level AM limit instead of the 
> user-level AM limit. It fixed this by using the user-specific AM limit that 
> is calculated in {{LeafQueue#activateApplications}}, stored in each user's 
> {{LeafQueue#User}} object, and retrieved via 
> {{UserInfo#getResourceUsageInfo}}.
> The problem is that this user-specific AM limit depends on the activity of 
> other users and other applications in a queue, and it is only calculated and 
> updated when a user's application is activated. So, when 
> {{CapacitySchedulerPage}} retrieves the user-specific AM limit, it is a stale 
> value unless an application was recently activated for a particular user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-10 Thread Eric Payne (JIRA)
Eric Payne created YARN-7728:


 Summary: Expose and expand container preemptions in Capacity 
Scheduler queue metrics
 Key: YARN-7728
 URL: https://issues.apache.org/jira/browse/YARN-7728
 Project: Hadoop YARN
  Issue Type: Improvement
Affects Versions: 3.0.0, 2.8.3, 2.9.0
Reporter: Eric Payne
Assignee: Eric Payne


YARN-1047 exposed queue metrics for the number of preempted containers to the 
fair scheduler. I would like to also expose these to the capacity scheduler and 
add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Resolved] (YARN-7424) Capacity Scheduler Intra-queue preemption: add property to only preempt up to configured MULP

2018-01-10 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne resolved YARN-7424.
--
Resolution: Invalid

bq. In order to create the "desired" behavior, we would have to fundamentally 
change the way the capacity scheduler works,
Closing

> Capacity Scheduler Intra-queue preemption: add property to only preempt up to 
> configured MULP
> -
>
> Key: YARN-7424
> URL: https://issues.apache.org/jira/browse/YARN-7424
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 3.0.0-beta1, 2.8.2
>Reporter: Eric Payne
>Assignee: Eric Payne
>
> If the queue's configured minimum user limit percent (MULP) is something 
> small like 1%, all users will max out well over their MULP until 100 users 
> have apps in the queue. Since the intra-queue preemption monitor tries to 
> balance the resource among the users, most of the time in this use case it 
> will be preempting containers on behalf of users that are already over their 
> MULP guarantee.
> This JIRA proposes that a property should be provided so that a queue can be 
> configured to only preempt on behalf of a user until that user has reached 
> its MULP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327877#comment-16327877
 ] 

Eric Payne commented on YARN-7728:
--

Hi [~sunilg]. Did you have a chance to review my comments and the new patch? I 
would be interested in your comments.

> Expose and expand container preemptions in Capacity Scheduler queue metrics
> ---
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338202#comment-16338202
 ] 

Eric Payne commented on YARN-7728:
--

{quote}I am fine with latest patch. If no issues, I could commit this patch.
{quote}
Hi [~sunilg]. Any update?

> Expose and expand container preemptions in Capacity Scheduler queue metrics
> ---
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-01-24 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16338321#comment-16338321
 ] 

Eric Payne commented on YARN-7813:
--

A grid administrator should be able to configure queues as follows. Currently, 
there is no way to configure a grid cluster like this:
{panel:title=VALID CONFIGURATION}
|Queue Name|cross-queue preemption enabled|in-queue preemption enabled|
|QueueA|true|true|
|QueueB|true|false|
{panel}
Currently, if system-wide in-queue preemption is enabled, it would be enabled 
for both {{QueueA}} and {{QueueB}}. In the above use case, the administrator 
wants to turn in-queue preemption on for {{QueueA}} but wants it off for 
{{QueueB}}.

A nuance of this feature is that in order for in-queue preemption to be 
enabled, cross-queue preemption must also be enabled. That is, the following 
should not be allowed:
{panel:title=INVALID CONFIGURATION}
|Queue Name|cross-queue preemption|in-queue preemption|
|QueueA|false|true|
|QueueB|false|true|
{panel}
The reason for this is because there is no guarantee that the scheduler will 
give a preempted container back to the queue that had it originally. So, if 
{{QueueA}} in the invalid configuration example is over its capacity, there is 
no way to tell the scheduler that a container was preempted for the purposes of 
balancing {{QueueA}}'s priority or user limit inversions. Instead, the capacity 
scheduler would first try to balance each queue's capacity guarantees by giving 
the container to {{QueueB}}.

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-01-24 Thread Eric Payne (JIRA)
Eric Payne created YARN-7813:


 Summary: Capacity Scheduler Intra-queue Preemption should be 
configurable for each queue
 Key: YARN-7813
 URL: https://issues.apache.org/jira/browse/YARN-7813
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: capacity scheduler, scheduler preemption
Affects Versions: 3.0.0, 2.8.3, 2.9.0
Reporter: Eric Payne
Assignee: Eric Payne


Just as inter-queue (a.k.a. cross-queue) preemption is configurable per queue, 
intra-queue (a.k.a. in-queue) preemption should be configurable per queue. If a 
queue does not have a setting for intra-queue preemption, it should inherit its 
parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7728) Expose container preemptions related information in Capacity Scheduler queue metrics

2018-01-26 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16341223#comment-16341223
 ] 

Eric Payne commented on YARN-7728:
--

Regarding test failures:

{{TestRMWebServiceAppsNodelabel}} fails in branch-2.8 as well. The other two 
succeed for me in my local repo.

> Expose container preemptions related information in Capacity Scheduler queue 
> metrics
> 
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch, 
> YARN-7728.branch-2.8.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7728) Expose container preemptions related information in Capacity Scheduler queue metrics

2018-01-25 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7728:
-
Attachment: YARN-7728.branch-2.8.002.patch

> Expose container preemptions related information in Capacity Scheduler queue 
> metrics
> 
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch, 
> YARN-7728.branch-2.8.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-12 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16324165#comment-16324165
 ] 

Eric Payne commented on YARN-7728:
--

Thanks a lot for the comments, [~sunilg].

bq. n 3.0, we support multiple types, and this covers only cpu and memory. So 
could we cover preemption metrics also in case of multi resources.
I agree with this in principle. However, I made a conscious decision not to do 
this. There are a couple of difficulties that I see. First, this is not done 
for other resource metrics in QueueMetrics (or any of the other system metrics 
I could find). The resource metrics only cover memory and vcores. Second, 
making the metric names match the resource names is a little difficult if the 
resource names could be dynamic. Because of these two things, I feel that 
solving this should be done all at the same time in a more general JIRA.

{quote}
One more doubt is with aggregateVcoreSecondsPreempted. MutableCounterLong is 
used for this. But under one queue, we ll have multiple containers gets 
preempted and each container resource size vary drastically. So are we looking 
for an aggregate resource among all preempted containers in a given time ?
{quote}
I don't think I understand the question. The metrics are updated when each 
container is preempted, and the value keeps increasing over time. Similar to 
memory, it's basically a metric of total lost (virtual) cpu cycles due to 
preemption since the RM was started.

{quote}
 aggregateMegabyteSecondsPreempted: MegaByte seems a bit confusing, MemoryMB is 
used in another places as well. Could we use something similar (like prepending 
memory)
{quote}
Good point. I will update a new patch.

> Expose and expand container preemptions in Capacity Scheduler queue metrics
> ---
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7728.001.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-12 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7728:
-
Attachment: YARN-7728.002.patch

> Expose and expand container preemptions in Capacity Scheduler queue metrics
> ---
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7728) Expose and expand container preemptions in Capacity Scheduler queue metrics

2018-01-11 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7728:
-
Attachment: YARN-7728.001.patch

> Expose and expand container preemptions in Capacity Scheduler queue metrics
> ---
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
> Attachments: YARN-7728.001.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7728) Expose container preemptions related information in Capacity Scheduler queue metrics

2018-01-30 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345383#comment-16345383
 ] 

Eric Payne commented on YARN-7728:
--

Hi [~sunilg]. Will you have a chance to look at the 2.8 backport in the next 
couple of days? thanks!

> Expose container preemptions related information in Capacity Scheduler queue 
> metrics
> 
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch, 
> YARN-7728.branch-2.8.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347209#comment-16347209
 ] 

Eric Payne commented on YARN-4606:
--

Thanks everyone for the thoughtful analysis.

I am still analyzing in more depth, but I have a couple of thoughts:
{quote}this is a (known) potential issue of fair ordering policy.
{quote}
This can happen for fifo ordering policy as well.
{quote}have {{activeUsersOfPendingApps}} along with {{activeUsers}}. Hence in 
case of scheduling we can depend only on {{activeUse}}
{quote}
We need to be careful with these counts because a user can have both active and 
pending apps. I think the definitions should be:
 - {{activeUsers}}: users that have at least one active app
 - {{activeUsersOfPendingApps}}: users that have only pending apps.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

2018-01-31 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347329#comment-16347329
 ] 

Eric Payne commented on YARN-4606:
--

My understanding is that user limit would use {{activeUsers}} and things like 
max AM limit per user, we'd use {{activeUsers}} + {{activeUsersOfPendingApps}}

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -
>
> Key: YARN-4606
> URL: https://issues.apache.org/jira/browse/YARN-4606
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Affects Versions: 2.8.0, 2.7.1
>Reporter: Karam Singh
>Assignee: Wangda Tan
>Priority: Critical
> Attachments: YARN-4606.1.poc.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-08 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.002.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-08 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357353#comment-16357353
 ] 

Eric Payne commented on YARN-7813:
--

Thanks @jlowe for the comments!

I'm attaching a new patch that addresses your comments. It does not cherry-pick 
cleanly to some of the previous branches. I am working on patches for those 
other branches.

{quote}
Will this cause queues to start performing intra-queue preemption that did not 
previously with the same configs or vice-versa?
{quote}

No. Previously, in-queue preemption was enabled and disabled at a particular 
queue level via the cross-queue preemption config property. If nothing changes 
in the configs, this behavior will remain the same. For example, if cross-queue 
preemption is disabled at the root queue and then enabled at root.QueueA, all 
children of QueueA will have both cross-queue and in-queue enabled. The 
intra-queue preemption will only be disabled at a particular level or below if 
the new property is set.

bq. "intreQueuePreemptionDisabled" s/b "intraQueuePreemptionDisabled"
Done.

{quote}
Why does CapacitySchedulerLeafQueueInfo have extra logic for getting 
intra-queue preemption disabled status? I don't see this similar logic 
elsewhere in the code.
{quote}
Yeah, I missed that. The desired behavior outlined above (if configs don't 
change, intra-queue enablement doesn't change) was not working quite right, so 
I moved some of the logic above the {{getIntraQueuePreemption}} and I didn't 
make the change where it was really important 
({{IntraQueueCandidatesSelector}}. I failed to do my usual rigorous testing so 
I missed it. I rectified the problem by adding the 
{{AbstractCSQueue#getIntraQueuePreemptionDisabledInHierarchy}} and then putting 
the cross-queue <-> in-queue dependency logic in 
{{AbstractCSQueue#getIntraQueuePreemptionDisabled}}.

bq. Technically the queue CLI output changes are incompatible per ...
Yup. I changed the output strings for preemption back to what they were before. 
I like that better anyway ;-)

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-06 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.001.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7728) Expose container preemptions related information in Capacity Scheduler queue metrics

2018-02-06 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353988#comment-16353988
 ] 

Eric Payne commented on YARN-7728:
--

[~sunilg], Are you okay with the branch-2.8 patch? If you are okay with it, I 
can commit it if you don't have time.

> Expose container preemptions related information in Capacity Scheduler queue 
> metrics
> 
>
> Key: YARN-7728
> URL: https://issues.apache.org/jira/browse/YARN-7728
> Project: Hadoop YARN
>  Issue Type: Improvement
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1
>
> Attachments: YARN-7728.001.patch, YARN-7728.002.patch, 
> YARN-7728.branch-2.8.002.patch
>
>
> YARN-1047 exposed queue metrics for the number of preempted containers to the 
> fair scheduler. I would like to also expose these to the capacity scheduler 
> and add metrics for the amount of lost memory seconds and vcore seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362674#comment-16362674
 ] 

Eric Payne commented on YARN-7813:
--

Thanks [~jlowe]. I committed to trunk and branch-3.1. The patch does not 
cleanly backport to 3.0 or prior, so I am attaching patches for those.

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.002.branch-3.0.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.003.branch-2.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-7927) YARN-7813 caused test failure in TestRMWebServicesSchedulerActivities

2018-02-13 Thread Eric Payne (JIRA)
Eric Payne created YARN-7927:


 Summary: YARN-7813 caused test failure in 
TestRMWebServicesSchedulerActivities 
 Key: YARN-7927
 URL: https://issues.apache.org/jira/browse/YARN-7927
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Eric Payne






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-13 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.003.branch-3.0.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-3.0.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-13 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363207#comment-16363207
 ] 

Eric Payne commented on YARN-7813:
--

I was checking failed unit tests for {{YARN-7813.002.branch-3.0.patch}} and 
noticed that the {{TestRMWebServicesSchedulerActivities}} failures are caused 
by this patch. The others are not failing for me in my local repo.

I have uploaded a new branch-3.0 patch (003). I will open a JIRA to fix it in 
trunk and 3.1.

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-3.0.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7927) TestRMWebServicesSchedulerActivities is failing

2018-02-14 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7927:
-
Attachment: YARN-7927.001.patch

> TestRMWebServicesSchedulerActivities is failing
> ---
>
> Key: YARN-7927
> URL: https://issues.apache.org/jira/browse/YARN-7927
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7927.001.patch
>
>
> YARN-7813 broke TestRMWebServicesSchedulerActivities.  The test needs to be 
> updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364217#comment-16364217
 ] 

Eric Payne commented on YARN-7813:
--

{code:java}
Removing intermediate container 0bc484afac12
Step 14/33 : RUN mkdir -p /opt/findbugs && curl -L -s -S  
https://sourceforge.net/projects/findbugs/files/findbugs/3.0.1/findbugs-noUpdateChecks-3.0.1.tar.gz/download
  -o /opt/findbugs.tar.gz && tar xzf /opt/findbugs.tar.gz 
--strip-components 1 -C /opt/findbugs
 ---> Running in 1f76e2c688d5

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
The command '/bin/sh -c mkdir -p /opt/findbugs && curl -L -s -S  
https://sourceforge.net/projects/findbugs/files/findbugs/3.0.1/findbugs-noUpdateChecks-3.0.1.tar.gz/download
  -o /opt/findbugs.tar.gz && tar xzf /opt/findbugs.tar.gz 
--strip-components 1 -C /opt/findbugs' returned a non-zero code: 2
{code}
The pre-commit build failed for {{YARN-7813.003.branch-2.patch}} with {{gzip: 
stdin: not in gzip format}}. I don't think that is related to the patch itself, 
so I'm restarting the build.

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364766#comment-16364766
 ] 

Eric Payne commented on YARN-7813:
--

I reverted the checkins for trunk, branch-3.1, and branch-3.0 because this 
patch is affecting the behavior of Auto Queue creation in the 
{{TestCapacitySchedulerAutoCreatedQueuePreemption}}

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7927) TestRMWebServicesSchedulerActivities is failing

2018-02-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364556#comment-16364556
 ] 

Eric Payne commented on YARN-7927:
--

{{TestCapacitySchedulerAutoCreatedQueuePreemption}} may also be related

> TestRMWebServicesSchedulerActivities is failing
> ---
>
> Key: YARN-7927
> URL: https://issues.apache.org/jira/browse/YARN-7927
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.1.0, 3.2.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7927.001.patch
>
>
> YARN-7813 broke TestRMWebServicesSchedulerActivities.  The test needs to be 
> updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-14 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.004.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-14 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16365044#comment-16365044
 ] 

Eric Payne commented on YARN-7813:
--

{{YARN-7813.004.patch}} should address the failures.

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.005.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch, YARN-7813.005.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7947) Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps

2018-02-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371520#comment-16371520
 ] 

Eric Payne commented on YARN-7947:
--

Great! Thanks a lot, [~sunilg]

> Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps
> --
>
> Key: YARN-7947
> URL: https://issues.apache.org/jira/browse/YARN-7947
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0, 3.1.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2, 3.2.0
>
> Attachments: YARN-7947.001.patch
>
>
> Intra-queue preemption policy can cause NPE for pending users with no 
> schedulable apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-21 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371523#comment-16371523
 ] 

Eric Payne commented on YARN-7813:
--

Awesome! Thanks [~jlowe]

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 3.0.2
>
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch, 
> YARN-7813.005.branch-2.8.patch, YARN-7813.005.branch-3.0.patch, 
> YARN-7813.005.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.005.branch-3.0.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch, 
> YARN-7813.005.branch-3.0.patch, YARN-7813.005.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-15 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366151#comment-16366151
 ] 

Eric Payne commented on YARN-7813:
--

Attached {{YARN-7813.005.patch}}. If the pre-commit passes, this should be the 
one.

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch, YARN-7813.005.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-16 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367867#comment-16367867
 ] 

Eric Payne commented on YARN-7813:
--

{{YARN-7813.005.patch}}
 - This patch backports to 3.1 cleanly
 - {{TestAMRMClientPlacementConstraints}} failure was not caused by this patch. 
It fails in trunk without the patch

{{YARN-7813.005.branch-3.0.patch}}
 - This patch backports to branch-2 and branch-2.9 with one minor merge 
conflict.
 - {{TestNodeLabelContainerAllocation}} failure was not caused by this patch. 
It fails in branch-3.0 without the patch

{{YARN-7813.005.branch-2.8.patch}}
 - The only unit test that fails for me in my local repo's build is 
{{TestRMWebServiceAppsNodelabel}}, which also fails in branch-2.8 without the 
patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch, 
> YARN-7813.005.branch-2.8.patch, YARN-7813.005.branch-3.0.patch, 
> YARN-7813.005.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-7813) Capacity Scheduler Intra-queue Preemption should be configurable for each queue

2018-02-15 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated YARN-7813:
-
Attachment: YARN-7813.005.branch-2.8.patch

> Capacity Scheduler Intra-queue Preemption should be configurable for each 
> queue
> ---
>
> Key: YARN-7813
> URL: https://issues.apache.org/jira/browse/YARN-7813
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: capacity scheduler, scheduler preemption
>Affects Versions: 2.9.0, 2.8.3, 3.0.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
> Attachments: YARN-7813.001.patch, YARN-7813.002.branch-3.0.patch, 
> YARN-7813.002.patch, YARN-7813.003.branch-2.patch, 
> YARN-7813.003.branch-3.0.patch, YARN-7813.004.patch, 
> YARN-7813.005.branch-2.8.patch, YARN-7813.005.branch-3.0.patch, 
> YARN-7813.005.patch
>
>
> Just as inter-queue (a.k.a. cross-queue) preemption is configurable per 
> queue, intra-queue (a.k.a. in-queue) preemption should be configurable per 
> queue. If a queue does not have a setting for intra-queue preemption, it 
> should inherit its parents value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-7947) Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps

2018-02-19 Thread Eric Payne (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369531#comment-16369531
 ] 

Eric Payne edited comment on YARN-7947 at 2/19/18 9:29 PM:
---

YARN-7051 added the following code. Instead of passing {{amUsed}} to the 
{{TempUserPerPartition}} constructor, it passes the intermediate variable 
{{userSpecificAmUsed}}, which can sometimes be null.

{{userSpecificAmUsed}} is only null when a user has no schedulable apps but 
does have non-schedulable apps, such as when the AM limit has been reached, for 
example. 
{code:title=FifoIntraQueuePreemptionPlugin#createTempAppForResCalculation}
Resource userSpecificAmUsed = perUserAMUsed.get(userName);
amUsed = (userSpecificAmUsed == null)
? Resources.none() : userSpecificAmUsed;

TempUserPerPartition tmpUser = new TempUserPerPartition(
tq.leafQueue.getUser(userName), tq.queueName,
Resources.clone(userResourceUsage.getUsed(partition)),
Resources.clone(userSpecificAmUsed),
Resources.clone(userResourceUsage.getReserved(partition)),
Resources.none());
{code}


was (Author: eepayne):
YARN-7051 added the following code. Instead of passing {{amUsed}} to the 
{{TempUserPerPartition}} constructor, it passes the intermediate variable 
{{userSpecificAmUsed}}, which can sometimes be null.

{{userSpecificAmUsed}} is only null when a user has no schedulable apps but 
does have non-schedulable apps, such as when the AM limit has been reached, for 
example. 
{code:title=FifoIntraQueuePreemptionPlugin#getAlreadySelectedPreemptionCandidatesResource}
Resource userSpecificAmUsed = perUserAMUsed.get(userName);
amUsed = (userSpecificAmUsed == null)
? Resources.none() : userSpecificAmUsed;

TempUserPerPartition tmpUser = new TempUserPerPartition(
tq.leafQueue.getUser(userName), tq.queueName,
Resources.clone(userResourceUsage.getUsed(partition)),
Resources.clone(userSpecificAmUsed),
Resources.clone(userResourceUsage.getReserved(partition)),
Resources.none());
{code}

> Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps
> --
>
> Key: YARN-7947
> URL: https://issues.apache.org/jira/browse/YARN-7947
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Reporter: Eric Payne
>Priority: Major
>
> Intra-queue preemption policy can cause NPE for pending users with no 
> schedulable apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Assigned] (YARN-7947) Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps

2018-02-19 Thread Eric Payne (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned YARN-7947:


Assignee: Eric Payne

> Capacity Scheduler intra-queue preemption can NPE for non-schedulable apps
> --
>
> Key: YARN-7947
> URL: https://issues.apache.org/jira/browse/YARN-7947
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, scheduler preemption
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Major
>
> Intra-queue preemption policy can cause NPE for pending users with no 
> schedulable apps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



<    3   4   5   6   7   8   9   10   11   12   >