[ https://issues.apache.org/jira/browse/YARN-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16157189#comment-16157189 ]
Wangda Tan commented on YARN-7149: ---------------------------------- [~jlowe], [~eepayne], [~sunilg], apologize for my late response. I just checked the behavior, it's not same as Jason and Eric mentioned: I tried to write an unit test: 1) A super fat node has 1000G memory. 2) Submit app1 to queue under user1, submit app2 to queue under user2. Each of them asks 1000 * 5G containers with locality. When UL=100, a single node heartbeat can allocate: 200 containers to app1. (no capacity left for app2). When UL=50, a single node heartbeat can allocate: 100 containers to app1, 100 containers to app2. The userLimit calculation formula mentioned by [~eepayne] is not correct, it should be: {code} userLimitResrouce = max{ ceil(queueResourceUsedByActiveUsers / #activeUsers), ceil(queueConfiguredCapacity * userLimit%) } {code} Because of the {{ceil}} operation, after each container allocation, we can get a new UL, and because user limit validation is a >= check instead of strict >, so it won't slow down container allocation. But I can see an issue here is, instead of do {{max}} operation in userLimitResource calculation, we should do {{min}}. Otherwise: - When we have two active users in the queue, and userLimit set to 100, first user will always get preferred until queue reaches maxCapacity. > Cross-queue preemption sometimes starves an underserved queue > ------------------------------------------------------------- > > Key: YARN-7149 > URL: https://issues.apache.org/jira/browse/YARN-7149 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 2.9.0, 3.0.0-alpha3 > Reporter: Eric Payne > Assignee: Eric Payne > Attachments: YARN-7149.demo.unit-test.patch > > > In branch 2 and trunk, I am consistently seeing some use cases where > cross-queue preemption does not happen when it should. I do not see this in > branch-2.8. > Use Case: > | | *Size* | *Minimum Container Size* | > |MyCluster | 20 GB | 0.5 GB | > | *Queue Name* | *Capacity* | *Absolute Capacity* | *Minimum User Limit > Percent (MULP)* | *User Limit Factor (ULF)* | > |Q1 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 | > |Q2 | 50% = 10 GB | 100% = 20 GB | 10% = 1 GB | 2.0 | > - {{User1}} launches {{App1}} in {{Q1}} and consumes all resources (20 GB) > - {{User2}} launches {{App2}} in {{Q2}} and requests 10 GB > - _Note: containers are 0.5 GB._ > - Preemption monitor kills 2 containers (equals 1 GB) from {{App1}} in {{Q1}}. > - Capacity Scheduler assigns 2 containers (equals 1 GB) to {{App2}} in {{Q2}}. > - _No more containers are ever preempted, even though {{Q2}} is far > underserved_ -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org