[ 
https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815917#comment-15815917
 ] 

Wangda Tan commented on YARN-5889:
----------------------------------

Thanks [~sunilg],

Following are my comments for overall code structure and call flow:

1) LeafQueue:
- Several unused members, could you check? 
- Can we move the users map to UsersManager? Ideally all operations on users 
should be redirected to UsersManager (UM)
- recalculateULCount is implementation details of user limit calculation, 
better to be moved to UM.
- Move all user-limit related configurations parameter (like ULF) to UM? 
Ideally UM should be more self-contained to make less dependencies and risk of 
deadlock.

2) UsersManager
- Better to move to cpaacity package, since it handles CS-only functionalities 
like user limit.
- Add a method like {{userLimitNeedsRecompute}} to handle the original logics 
of LQ#recalculateULCount
- User#setCachedCount, should we invalidateUL for the user who 
allocates/releases containers, or we should invalidate all user limit? I think 
the latter one is more safe to me. If you agree, I suggest LQ to call 
UM#userLimitNeedsRecompute to notify UM. 

3) UM, logics to compute UL
First, the UL is classified by user-name, active state, scheduling-mode, 
partition. However I think we don't need user-name. Existing UL will be 
identical for users in active set and users in all-set.
Second, existing logic automatically computes all schedulingMode, which may not 
necessary. The ignore-exclusivity is not common used, we can compute it only 
when necessary.

If you agree above, we can simplify API a little bit, we only need userName (to 
get if it's an activeUser), clusterResource, partition. ResourceCalculator can 
be stored inside UM, we don't need to pass it as parameter everytime.

And the call flow may look like:
{code}
UM#getActiveUserLimit(userName, clusterResource, partition, schedulingMode) {
        if (needRecompute) {
                return recompute(userName, clusterResource, partition, 
schedulingMode)
        }
        return getCachedActiveUserLimit(userName, clusterResource, partition, 
schedulingMode);
}
{code}

4) ActiveUserManager
- I think we don't need to use the class in CS. Adding {{Set<ApplicationId>}} 
of UM#User, and add other fields to UM. It could have some duplicated code, but 
the code structure will be more clean.

> Improve user-limit calculation in capacity scheduler
> ----------------------------------------------------
>
>                 Key: YARN-5889
>                 URL: https://issues.apache.org/jira/browse/YARN-5889
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Sunil G
>            Assignee: Sunil G
>         Attachments: YARN-5889.0001.patch, 
> YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, 
> YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch
>
>
> Currently user-limit is computed during every heartbeat allocation cycle with 
> a write lock. To improve performance, this tickets is focussing on moving 
> user-limit calculation out of heartbeat allocation flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to