[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

Manikandan R (JIRA) Thu, 28 Jun 2018 07:02:47 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526328#comment-16526328
 ]


Manikandan R commented on YARN-4606:
------------------------------------

[~eepayne] Thank you for great explanation. I am able to understand the flow 
better now.

I revisited "move apps" problem which i raised earlier based on new patch and 
don't think it requires any changes as variables required to calculate 
numActiveUsersWithOnlyPendingApps are already being set through 
submitApplication, finishApplication etc calls. However, I am seeing an minor 
update issue as described below:

Lets say, We want to move all apps from queue, A1 to queue, B1. A1 has 4 apps 
(Only 2 were accommodated because of max am limit constraint. So, remaining 2 
not yet activated). All these 4 apps are triggered by different users from u1 
to u4. For example app1 by u1 and so on. Only for app 1 & app2, there is an 
allocate request in pipeline. At this point, {{numActiveUsers}} is 4 and 
{{numActiveUsersWithOnlyPendingApps}} is 2 in Queue, A1. Now move has been 
triggered. Since there were running containers for both app 1 and app 2, app3 
and app4 has been activated before app 1 and app 2 in Queue, B1 as both these 
apps were busy in detaching and attaching containers. After the move operation 
and thread sleep of 5s, pulled these counts expecting u1 and u2 as 
ActiveUsersWithOnlyPendingApps, but couldn't able to see it. {{numActiveUsers}} 
is 2 as u3 and u4 had become active users and 
{{numActiveUsersWithOnlyPendingApps}} is 0 in Queue B1. Then, introduced an 
NodeUpdate event after the move operation just to force the user limit 
computation to see the impact on these counts. Now, can able to 
ActiveUsersWithOnlyPendingApps as 2 and ActiveUsers as 0 (as both u3 and u4 had 
become non active users by this time as there are no pending allocate request).

So, after move app operation and if there is no events (which can trigger user 
limit computation) for brief amount of time, am seeing incorrect 
{{numActiveUsersWithOnlyPendingApps}} count. Is this acceptable? or Should we 
trigger user limit computation after move operation like how we are doing it in 
other places? Please share your thoughts and correct my understanding if you 
see a gap

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

Reply via email to