[ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513874#comment-16513874
 ] 

Manikandan R commented on YARN-4606:
------------------------------------

Thanks [~eepayne] for your reviews. I was trying to address "move app" flow 
also in addition to your review comments, but stuck with it and took more time 
than expected. Sorry for the delay. 

I stuck with a case,  admin trying to move an app (waiting for am container) 
from Queue A to Queue B. As part of this, control reaches 
{{AppScheduling#move}} through {{CapacityScheduler#moveApplication}}. As a 
first step, we will need to handle activeUsersWithPendingApps count for both 
queues. For example, After submitting the app to queue inside 
{{CapacityScheduler#moveApplication}}, we will need to do something like 

{quote}
        //Handle activeUsersWithOnlyPendingApps count appropriately
        if (app.isPending()) \{
          this.getQueue(sourceQueueName).getAbstractUsersManager().
          decrNumActiveUsersWithOnlyPendingApps(user);
          this.getQueue(destQueueName).getAbstractUsersManager().
          incrNumActiveUsersWithOnlyPendingApps(user);
        } {quote}

Then, Inside, {{AppScheduling#move}}, we will need to follow the logic similar 
to changes in {{AppScheduling#updatePendingResources}} to call 
{{UsersManager#activateApplications}}. Call to 
{{AppScheduling#updatePendingResources}} happens as part of Allocate flow every 
now and then. There is no such periodic calls for Move App. At some point, 
waitingForAMContainer become false for a given app and call to 
{{UsersManager#activateApplications}} happens and user got activated in normal 
app flow. We will need to handle the same even in Move App flow. I was thinking 
of waiting for some duration (possibly based on average am container allocation 
time? ) so that chance of getting container for am likely to happen. I am not 
sure. Attached patch contains this change as well. Please advise. 

Now, coming back to review comments:

1. Yes, it is scheduler specific. [~leftnoteasy] and [~sunilg] Please share 
your views.
2. For the first cut, I was thinking of fixing this JIRA for CS from end to 
end. Once fix has been ensured for CS, can apply similar changes to FS as well 
either with this jira or a different jira. If we are going to address FS 
related changes in different jira, is it ok to carry the risk you mentioned 
earlier? Please advise. Either, I can take help from folks who are familiar 
with FS flow or can hand over to them. Which ever is fine with us.
3. Addressed.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.1.poc.patch, YARN-4606.POC.2.patch, 
> YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to