[ https://issues.apache.org/jira/browse/YARN-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815917#comment-15815917 ]
Wangda Tan commented on YARN-5889: ---------------------------------- Thanks [~sunilg], Following are my comments for overall code structure and call flow: 1) LeafQueue: - Several unused members, could you check? - Can we move the users map to UsersManager? Ideally all operations on users should be redirected to UsersManager (UM) - recalculateULCount is implementation details of user limit calculation, better to be moved to UM. - Move all user-limit related configurations parameter (like ULF) to UM? Ideally UM should be more self-contained to make less dependencies and risk of deadlock. 2) UsersManager - Better to move to cpaacity package, since it handles CS-only functionalities like user limit. - Add a method like {{userLimitNeedsRecompute}} to handle the original logics of LQ#recalculateULCount - User#setCachedCount, should we invalidateUL for the user who allocates/releases containers, or we should invalidate all user limit? I think the latter one is more safe to me. If you agree, I suggest LQ to call UM#userLimitNeedsRecompute to notify UM. 3) UM, logics to compute UL First, the UL is classified by user-name, active state, scheduling-mode, partition. However I think we don't need user-name. Existing UL will be identical for users in active set and users in all-set. Second, existing logic automatically computes all schedulingMode, which may not necessary. The ignore-exclusivity is not common used, we can compute it only when necessary. If you agree above, we can simplify API a little bit, we only need userName (to get if it's an activeUser), clusterResource, partition. ResourceCalculator can be stored inside UM, we don't need to pass it as parameter everytime. And the call flow may look like: {code} UM#getActiveUserLimit(userName, clusterResource, partition, schedulingMode) { if (needRecompute) { return recompute(userName, clusterResource, partition, schedulingMode) } return getCachedActiveUserLimit(userName, clusterResource, partition, schedulingMode); } {code} 4) ActiveUserManager - I think we don't need to use the class in CS. Adding {{Set<ApplicationId>}} of UM#User, and add other fields to UM. It could have some duplicated code, but the code structure will be more clean. > Improve user-limit calculation in capacity scheduler > ---------------------------------------------------- > > Key: YARN-5889 > URL: https://issues.apache.org/jira/browse/YARN-5889 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Reporter: Sunil G > Assignee: Sunil G > Attachments: YARN-5889.0001.patch, > YARN-5889.0001.suggested.patchnotes, YARN-5889.0002.patch, > YARN-5889.v0.patch, YARN-5889.v1.patch, YARN-5889.v2.patch > > > Currently user-limit is computed during every heartbeat allocation cycle with > a write lock. To improve performance, this tickets is focussing on moving > user-limit calculation out of heartbeat allocation flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org