[ https://issues.apache.org/jira/browse/YARN-3945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649531#comment-14649531 ]
Wangda Tan commented on YARN-3945: ---------------------------------- Thanks for comments, [~nroberts]! bq. I don't think we can change it in any significant way at this point without a major configuration switch that clearly indicates you're getting different behavior. I'm sure admins have built up clusters with this tuned in very specific ways, a significant change wouldn't be compatible with their expectations. I agree that we cannot change the behavior of this option itself, but I think we can add new option instead. bq. I don't really agree with this. It may not be doing an ideal job but I think the intent is to introduce fairness between users. It's a progression from 0 being the most fair, and 100+ being more fifo. In your example it's trying to get everyone 50% which isn't likely to happen so in this case it's going to operate mostly fifo. If the intent is to be much more fair across the 10 users, then a much smaller value would be appropriate. The problem I can see is, it uses #active-user to compute user-limit, this can lead to unfair, an example of this: A queue has 100 guaranteed resource (capacity=max-capacity=100). And minimum-user-limit=25. There're 4 users in the queue, they're using u1=40, u2=30, u3=20, u4=10 resources. After a while, u3 finished its application, so there're 20 available resources. Only u2 and u1 are asking resources. So the user-limit = max(1/#active-user, 25/100) = 50. So it is possible u2 get all available resource, and usage becomes u1=40, u2=50, u3=0, u4=10. This is very unfair to me. And I think currently we cannot relief this issue via tuning minimum-user-limit. bq. Since the scheduler can't predict what an application is going to request in the future, I don't see how a predictable formula is even possible (ignoring the possibility of taking away resources via in-queue preemption). It's not great, but being fair to currently requesting users makes some bit of sense. The definition of predictable in my mind is: given resource request of each user, queue's guaranteed/available/used resource, we can get how much resource of each user can get. The above example shows we cannot get resource of each user can get. If thinking more fair, when there's any available resource, we should give them to users have requirement and also respecting their usage (e.g. we should give 20 available resource to u4 to make usage to be u1=40, u2=30, u3=0, u4=30). bq. user-limit-factor is the max-resource-limit of each user today, right? The second one seems very hard to track. It seems like one of the initial users can stay in the "guaranteed" set as long as he keeps requesting resources. This doesn't seem very fair to the users only getting idle shares. You're correct, it is not good. How about computing fair share (as same as how fair scheduler computes fair share) for users within a queue, it will be a new option like (enable-user-fair-share), and user can choose to use minimum-user-limit OR enable-user-fair-share. > maxApplicationsPerUser is wrongly calculated > -------------------------------------------- > > Key: YARN-3945 > URL: https://issues.apache.org/jira/browse/YARN-3945 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 2.7.1 > Reporter: Naganarasimha G R > Assignee: Naganarasimha G R > Attachments: YARN-3945.20150728-1.patch, YARN-3945.20150729-1.patch > > > maxApplicationsPerUser is currently calculated based on the formula > {{maxApplicationsPerUser = (int)(maxApplications * (userLimit / 100.0f) * > userLimitFactor)}} but description of userlimit is > {quote} > Each queue enforces a limit on the percentage of resources allocated to a > user at any given time, if there is demand for resources. The user limit can > vary between a minimum and maximum value.{color:red} The the former (the > minimum value) is set to this property value {color} and the latter (the > maximum value) depends on the number of users who have submitted > applications. For e.g., suppose the value of this property is 25. If two > users have submitted applications to a queue, no single user can use more > than 50% of the queue resources. If a third user submits an application, no > single user can use more than 33% of the queue resources. With 4 or more > users, no user can use more than 25% of the queues resources. A value of 100 > implies no user limits are imposed. The default is 100. Value is specified as > a integer. > {quote} > configuration related to minimum limit should not be made used in a formula > to calculate max applications for a user -- This message was sent by Atlassian JIRA (v6.3.4#6332)