[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

Manikandan R (JIRA) Mon, 25 Jun 2018 10:49:10 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-4606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16522599#comment-16522599
 ]


Manikandan R commented on YARN-4606:
------------------------------------

[~eepayne] Thanks for the patch.

At a high level, POC is very simple from implementation perspective and changes 
would be minimal with this approach. At the same time, this patch is less 
"strict" in terms of updates (specifically on when? ) compared to approaches 
discussed in our earlier patches. For example, In earlier approach, 
numActiveUsersWithOnlyPendingApps would be incremented as soon as app gets 
activated and gets decremented as soon as AM container gets allocated. In 
addition, all of these things happens immediately and only after the dependent 
steps gets completed for sure. Whereas, new POC patch depends on the values 
(pendingApplications, activeApplications etc of User object), conditions before 
the actual work (for example, assuming AM container would be allocated 
successfully based on checks in LeafQueue#activateApplications) and updates 
numActiveUsersWithOnlyPendingApps as part of regular computeUserLimits flow. 
All these things is creating a slight discomfort and lead to some of the 
questions like

What is the time frame that we are seeing between accepting the app and 
updating numActiveUsersWithOnlyPendingApps? Is this time frame acceptable? 
Aren't we running little slower in doing updates? Is there any chance by which 
AM container has been failed to allocate? Lets say, If AM container allocation 
goes through successfully, Would be there any delay in allocating AM 
containers? During this delayed duration, we are considering the user as active 
user rather than treating the user as "activeUsersWithOnlyPendingApps". Is this 
acceptable? I am interested in understanding your thoughts behind this tradeoff.

Also, based on our earlier discussions, We need to depend on 
{{activeUsers.get()}} only in certain context and sum of {{activeUsers.get()}} 
and {{activeUsersWithOnlyPendingApps.get()}} in some other places. But POC 
patch always depends on later value. I didn't understand this part.

On the other hand, We can avoid {{AppAMAttemptsFailedSchedulerEvent}} related 
changes completely with this new patch as anyway {{User.finishApplication()}} 
would be called for sure even when max AM attempts has been reached.

Please share your thoughts.

> CapacityScheduler: applications could get starved because computation of 
> #activeUsers considers pending apps 
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4606
>                 URL: https://issues.apache.org/jira/browse/YARN-4606
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>    Affects Versions: 2.8.0, 2.7.1
>            Reporter: Karam Singh
>            Assignee: Manikandan R
>            Priority: Critical
>         Attachments: YARN-4606.001.patch, YARN-4606.002.patch, 
> YARN-4606.003.patch, YARN-4606.004.patch, YARN-4606.1.poc.patch, 
> YARN-4606.POC.2.patch, YARN-4606.POC.3.patch, YARN-4606.POC.patch
>
>
> Currently, if all applications belong to same user in LeafQueue are pending 
> (caused by max-am-percent, etc.), ActiveUsersManager still considers the user 
> is an active user. This could lead to starvation of active applications, for 
> example:
> - App1(belongs to user1)/app2(belongs to user2) are active, app3(belongs to 
> user3)/app4(belongs to user4) are pending
> - ActiveUsersManager returns #active-users=4
> - However, there're only two users (user1/user2) are able to allocate new 
> resources. So computed user-limit-resource could be lower than expected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-4606) CapacityScheduler: applications could get starved because computation of #activeUsers considers pending apps

Reply via email to