[ 
https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155474#comment-16155474
 ] 

Jason Lowe commented on YARN-7149:
----------------------------------

Thanks for the report and analysis, Eric!  So it appears YARN-5889's change to 
try to balance the growth of users violated the preemption monitor's 
forecasting of resource assignments.  One way to fix this is to change the 
preemption monitor's forecasting calculations to use the old user limit 
calculations, however I'm wondering if we should revisit the decision to change 
the user limit calculations in YARN-5889.

I understand the desire to try to balance user growth, but it seems like this 
is going to significantly slow container assignment when there are multiple 
active users to solve a problem that I'm not sure is a real problem in 
practice.  If I understand the concern properly, we want to try to avoid a 
situation where one user can quickly rush ahead to their full user limit, well 
ahead of the other users, and then before the other users get to their same 
limit something happens (e.g.: more users become active, cluster loses 
capacity, etc.).  That window should be very small in practice (i.e.: a few 
seconds to a few tens of seconds) because the user limit should reflect 
capacity that is available right now.  The speed at which the user limit is 
reached should only be limited by the heartbeat rate of the nodes and how picky 
the container requests are.

I'm concerned about the new approach because it looks like it will 
significantly slow down container assignments.  For example there are two 
users, A and B, each with a single active application that is asking for many 
more containers than the queue can provide.  User A's app is ahead of user B's 
app in the queue, and the queue is initially almost empty.  Before the user 
limit change, the user limits for each user would be 50% since they are the 
only two active users in the queue.  As nodes heartbeat into the scheduler, the 
scheduler would aggressively assign containers, likely more than one for each 
heartbeat, to user A until the 50% user limit is reached.  At that point it 
would switch to assigning containers to user B, again likely more than one per 
node heartbeat.  Unless the container requests are very picky, it should only 
take two rounds or so of node heartbeats to satisfy both users which should 
only be  a small number of seconds.  With the new limit calculation, the user 
limits for A and B are going to be only the minimal increment over what they're 
using.  Therefore each node heartbeat will only assign one container to each 
user rather than multiple since it will keep running into the user limit before 
it grows.  The end result is it will take a lot more node heartbeats to get 
everything assigned.  That will be perceived as a slow scheduler to users.  Do 
we really need to keep the assignments balanced as users grow to their limit?  
It looks like it will be a significant performance hit to do so since we will 
keep hitting the limit on each node heartbeat, cutting short the number of 
containers we would normally assign per heartbeat.

> Cross-queue preemption sometimes starves an underserved queue
> -------------------------------------------------------------
>
>                 Key: YARN-7149
>                 URL: https://issues.apache.org/jira/browse/YARN-7149
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.9.0, 3.0.0-alpha3
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> In branch 2 and trunk, I am consistently seeing some use cases where 
> cross-queue preemption does not happen when it should. I do not see this in 
> branch-2.8.
> Use Case:
> | | *Size* | *Minimum Container Size* |
> |MyCluster | 20 GB | 0.5 GB |
> | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit 
> Percent (MULP)* | *User Limit Factor (ULF)* |
> |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 |
> - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB)
> - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB
> - _Note: containers are 0.5 GB._
> - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}.
> - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}.
> - _No more containers are ever preempted, even though {{Q2}} is far 
> underserved_



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to