[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520822#comment-16520822
 ] 

Eric Payne commented on YARN-4606:
----------------------------------

[~maniraj...@gmail.com], we can fix the queue application starvation problem by 
making most of the changes in the scheduler-specific users managers. For 
{{CapacityScheduler}}, all the changes can be done in the {{UsersManager}} 
class. For the other schedulers (FIfo, Fair, etc.), I think there needs to be 
some amount of changes in the scheduler infrastructure classes to support 
retrieving iformation such as number of pending and active apps per user, 
amount of queue's AM limit resources, amount of a user's used AM resources, 
etc. But I think that most of the changes can be done in {{ActiveUsersManager}} 
for other schedulers as well.

I am attaching a POC patch that only modifies {{UsersManager}}. The 
{{UsersManager}} already keeps track of all users in the queue. Each user 
object keeps the number of active apps and the number of pending apps. here is 
the sequence of events plus proposed change:
 - When an application is submitted, the user object's pending apps count is 
incremented
 - If limits are not exceeded, {{LeafQueue}} activates the app
 -- {{Leafqueue#activateApplications}} already checks whether or not activation 
of an application will go over the queue's AM limit.
 -- If activating the application will not go over the queue's AM limit, 
{{Leafqueue#activateApplications}} will increment the user object's active app 
count and decrement the pending app count.
 -- However, if activating the application will go over the queue's AM limit, 
the user's pending app count remains the same.
 - The change made in {{YARN-4606.POC.3.patch}} is that 
{{UsersManager#activateApplication}} will check whether or not the user object 
has any active apps. If not, it will not continue (thus not putting the user in 
the {{activeUsers}} list).

I have not yet analyzed the problem you pointed out above regarding moving apps 
to different queues.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to