[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544404#comment-14544404 ] Eric Payne commented on YARN-2069: -- Hi [~mayank_bansal]. Thanks for working through the details related to this issue. I have one small nit. In {{LeafQueue#computeTargetedUserLimit}}, it does not look like the {{MIN}} and {{MAX}} variables are ever used. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Labels: BB2015-05-TBR > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-10.patch, > YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, > YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, > YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343867#comment-14343867 ] Wangda Tan commented on YARN-2069: -- This comment makes sense to me, maybe YARN-2113 should be a correct place to implement logics for respecting user share. [~mayank_bansal], is it possible, could you rebase the patch against latest trunk? I can take a look again. Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-10.patch, > YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, > YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, > YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343739#comment-14343739 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662100/YARN-2069-trunk-10.patch against trunk revision ca1c00b. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6805//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-10.patch, > YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, > YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, > YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343718#comment-14343718 ] Vinod Kumar Vavilapalli commented on YARN-2069: --- In the above example, there are two choices. # Respecting app order: {15%, 6%, %6, %6, %6} (total is 39%) # Respecting user share: {9%, 9%, %9, %6, %6} (total is 39%) I think without YARN-2113 (which is really about making sure allocation also actively balances users via preemption), doing (1) above is not going to help - allocation will come in and imbalance users anyways. So, it's ok to implement either. If the patch has already gone ahead and implemented (2), I'd vote for keeping it as is and make a note on YARN-2113 to change it. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-10.patch, > YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, > YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, > YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14098938#comment-14098938 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12662100/YARN-2069-trunk-10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4636//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4636//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4636//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-10.patch, > YARN-2069-trunk-2.patch, YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, > YARN-2069-trunk-5.patch, YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, > YARN-2069-trunk-8.patch, YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084279#comment-14084279 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], Thanks for your patience. I've just read through your new patch. After #1/#2, if there's more resource need preempt, AM container will be preempted. Is it corect? Please let me know if I misread your approach. *I think we should discuss scope of this JIRA first, I'm a little confused after thought about it.* According to the desc of this JIRA, we need make sure: (Assume we calculated {{target-user-limit}} already). *REQ #1:* When consider preempt a container from user-x, if {{used-resource - marked-preempted-resource}} of user-x already <= {{target-user-limit}}. We need make sure, no any other user in the queue has {{used-resource - marked-preempted-resource}} > {{target-user-limit}}. *REQ #2:* When we have to preempt an AM container, we need make sure #1 too. And as commented by [~vinodkv]: https://issues.apache.org/jira/browse/YARN-2069?focusedCommentId=14064047&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14064047. *REQ #3:* User's resource after preemption should be as balanced as possible around {{target-user-limit}} Do you agree with these requirements? I think we should update requirements to JIRA desc if we decided. * My understanding of your new patch consists of two phases:* 1. {{distributePreemptionforUsers}} will do preemption to enforce {{target-user-limit}} for each user. 2. If there's more resource need preempted, will call {{distributePreemptionforUsers}} to make sure {{resToObtain}} is distributed to {{resToObtain}} divide {{#active-user}} in the queue. I think phase-1 can enforce REQ#1. But phase-2 cannot enforce REQ#3. And also, REQ#2 cannot be satisfied in the patch. Let me give you an example about why REQ#3 not satisfied, similar to Vinod's example: {code} Queue has guaranteed resource = 30%, now it used 60%, want to shrink it down to 40%. Container size are equal, which is 3% of the cluster. Now 5 app in the queue, user-limit configured to 20%. So expected resource are {8%, 8%, 8%, 8%, 8%}. Before preemption: {15%, 9%, 12%, 12%, 12%} It is possible after preemption in your current appoarch: {15%, 6%, %6, %6, %6} (total is 39%) {code} Sometimes we cannot get all user's resource exactly same to {{target-user-limit}} because contianer size may not divisible by {{target-user-limit}}. But we can do better in following example {code} After preemption: {9%, 9%, %9, %6, %6} (total is 39%) {code} The unbalanced happened caused by accumulated bias I mentioned in my comment: https://issues.apache.org/jira/browse/YARN-2069?focusedCommentId=14074249&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14074249 Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, > YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081985#comment-14081985 ] Hadoop QA commented on YARN-2069: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659051/YARN-2069-trunk-9.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4502//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4502//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, > YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081845#comment-14081845 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], Thanks for uploading, reviewing it now. Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch, > YARN-2069-trunk-9.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081331#comment-14081331 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12658971/YARN-2069-trunk-8.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4499//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/4499//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4499//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081193#comment-14081193 ] Mayank Bansal commented on YARN-2069: - Hi [~wangda] , Thanks for your review comments. Updating the patch with the fix. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch, YARN-2069-trunk-8.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075220#comment-14075220 ] Wangda Tan commented on YARN-2069: -- Hi Mayank, Thanks for your detailed explanation, I think I understood your approach. However, I think the current way to compute target user limit is not correct, let me explain: I found basically, your created {{computeTargetedUserLimit}} is modified from {{computeUserLimit}}, it will calculate as following {code} target_capacity = used_capacity - resToObtain min( max(target_capacity / #active_user, target_capacity * user_limit_percent), target_capacity * user_limit_factor)), {code} So when a user_limit_percent is set as default (100%), it is possible that target_user_limit * #active_user > queue_max_capacity. In this case, it is possible that any of the user-usage is below target_user_limit, but the usage of the queue is larger than guaranteed resource. Let me give you an example {code} Assume queue capacity = 50, used_resource = 70, resToObtain = 20 So target_capacity = 50, there're 5 users in the queue user_limit_percent = 100%, user_limit_factor = 1 (both are default) So target_user_capacity = min(max(50 / 5, 50 * 100%), 50) = 50 User1 used 20 User2 used 10 User3 used 10 User4 used 20 User5 used 10 So all user's used capacity are < target_user_capacity {code} In existing logic of {{balanceUserLimitsinQueueForPreemption}} {code} if (Resources.lessThan(rc, clusterResource, userLimitforQueue, userConsumedResource)) { // do preemption } else continue; {code} If a user used resource < target_user_capacity, it will not be preempted. Mayank, is that correct? Or I misunderstood your logic? Please let me know you comments, Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074940#comment-14074940 ] Mayank Bansal commented on YARN-2069: - HI [~wangda] , Thanks for the review. Let me explain what this algo is doing . Lets say you have queueA in your cluster with capacity 30% allocated to it. Now Queue A is using 50% resources. Queue A has 5 users with 20% user limit. That means with each user is using 10% of the capacity of the cluster. Now Another queueB is there with allocated capacity 70%. Used capacity of queue B is 50%. Now if another application gets submitted to Queue B which needs 10% capacity. Now 10% capacity has to be claimed back from queue A . So restoobtain = 10% Targated user limit will be = 8% (This is always calculated based on how much we need to calim back from user) So based on the current alogorithm , it will take out 2% resources from every user and will leave behind the balance for each users. This will also be true if all the users are not using same number of resources so based on this algo it will take out more from the users which are using more to balance till targated user limit. Other thing this algo also does is it preempt application which is submitted last that means if user1 has 2 application, it will try to take the maximum containers from the last application submitted leaving behind the AM container however user limit will be honoured with combined all applications in the queue. This algo does not remove AM continer if its not absolutely needed, it goes to get all the tasks containers first and then consider AM containers.to be preempted. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch, YARN-2069-trunk-7.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074249#comment-14074249 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], Thanks for working on this again. I've taken a brief look at your patch, I think the general appoarch in your patch is: - Compute a target-user-limit for a given queue, - Preempt containers according to a user's current comsumption and target-user-limit, - If more resource need to be preempted, we should consider preempt AM container, I think there're couple of rules we need respect (Please let me know if you don't agree with any of them), # Used resource of users in a queue after preempted should be as average as possible # Before we start preempting AM containers, all task containers should be preempted (according to YARN-2022, keep preempting AM container as least priority) # If we should preempt AM container, we should respect #1 too For #1, If we want to quantize the result, it should be: {code} i∈{user} Let rp_i = used-resource-after-preemption of user_i Minimize sqrt(Σ(rp - Σ(rp_i)/#{user})^2) i i {code} In another word, we should minimize standard deviation of used-resource-after-preemption. Since not all containers are equal in size, so it is possible that used-resource-after-preemption of a given user cannot precisely equal to target-user-limit. In our current logic, we will make used-resource-after-preemption <= target-user-limit. considering following example, {code} qA: has user {V, W, X, Y, Z}; each user has one application V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G, minimum_allocation=1G W: app4: {4, 4, 4, 4}, X: app3: {4, 4, 4, 4}, Y: app2: {4, 4, 4, 4, 4, 4}, Z: app1: {4} target-user-limit=11, resource-to-obtain=23 After preemption: V: {4, 4} W: {4, 4} X: {4, 4} Y: {4, 4, 4, 4, 4, 4} Z: {4} {code} This imbalance happens because, for every application we preempted, may excess user-limit (bias), the more user we processed, the more potentially accumulated bias we might have. In another word, the un-balanced is linear correlated number-of-user-in-a-queue multiplies average-container-size And we cannot solve this problem by preempting from user has most usage, still the example: {code} qA: has user {V, W, X, Y, Z}; each user has one application V: app5: {4, 4, 4, 4}, //means V has 4 containers, each one has memory=4G, minimum_allocation=1G W: app4: {4, 4, 4, 4}, X: app3: {4, 4, 4, 4}, Y: app2: {4, 4, 4, 4, 4, 4}, Z: app1: {4} target-user-limit=11, resource-to-obtain=23 After preemption (from user has most usage, the sequence is Y->V->W->X->Z): V: {4, 4} W: {4, 4, 4, 4} X: {4, 4, 4, 4} Y: {4, 4} Z: {4} {code} Still not very balanced, the ideal result should be: {code} V: {4, 4, 4} W: {4, 4, 4} X: {4, 4, 4} Y: {4, 4, 4} Z: {4} {code} In addition, this appoarch cannot resolve rule #2/#3 as well if target-user-limit is not appropriately computed. So I propose to do in another way, We should recompute used-resource - marked-preempted-resource every time for a user after making decision of preemption each container. Maybe we can use a priority queue here to store (used-resource - marked-preempted-resource) here. And we don’t need to compute a target user limit here. The pseudo code for preempting resource of a queue might look like: {code} compute resToObtain first; // first preempt task containers while (resToObtain > 0) { pick a user-x which has most (used-resource - marked-preempted-resource) pick one container-y from user to preempted resToObtain -= container-y.resource } if (resToObtain <= 0) { return; } // if more resource need to be preempted, we should preempt AM container while (resToObtain > 0 && total-am-resource - marked-preempted-am-resource > max-am-percentage) { // do the same thing again: pick a user-x which has most (used-resource - marked-preempted-resource) pick one container-y from user to preempted resToObtain -= container-y.resource } {code} With this, we can make the un-balanced linear correlated with average-container-size only and solved the #2/#3 rules we should respect I mentioned before altogether. Mayank, do you think is it looks like a reasonable suggestion? Any other thoughts? [~vinodkv], [~curino], [~sunilg]. Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.pat
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073727#comment-14073727 ] Mayank Bansal commented on YARN-2069: - Thanks [~vinodkv] for the review. I have changed the patch based on the targated capacity for the queue. It balances out with the users resources. I also removed the twp passes and now its only one pass. Please review it. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch, > YARN-2069-trunk-6.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064047#comment-14064047 ] Vinod Kumar Vavilapalli commented on YARN-2069: --- I had a long look at over the last weekend. This is what the patch does as I understand it - We are performing two passes - one pass to redistribute based on user-limits and one more pass over apps in reverse order of submission - Both the passes are best case - we just keep marking containers for preemption till we get back enough to satisfy queue capacities Let's take an example: - Let's assume there is a queue with guaranteed capacity of 30%, current capacity of 50% and we want it to shrink down to 40% to satisfy some other queue's capacities. - Assume, 5 different users in this queue and with a minimum user limit percentage configured to be 20% of the queue(i.e. no more than 5 users will run concurrently in this queue). - So current state is that each user has resources of 8%, 8%, 8%, 8%, 8%. Depending on how users came into the system, the patch may bring it down to 10%, 10%, 8%, 6%, 6% - it'd still be an unbalanced user share within the queue. Further, the current user-limit calculation seems to be based on _current capacity_ and not _targeted capacity_. So, I am kind of surprised the patch preempts containers to satisfy any containers at all. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062911#comment-14062911 ] Hadoop QA commented on YARN-2069: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654821/YARN-2069-trunk-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4315//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4315//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062578#comment-14062578 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654821/YARN-2069-trunk-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4311//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4311//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057175#comment-14057175 ] Wangda Tan commented on YARN-2069: -- Hi Mayank, Can you re-kick Jenkins manually? Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057170#comment-14057170 ] Mayank Bansal commented on YARN-2069: - I just verified, rebased the patch and compiled and tested . Patch doesn't seems to be the problem. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057044#comment-14057044 ] Wangda Tan commented on YARN-2069: -- LGTM, +1 when Jenkins come back. Seems lots of precommit buildings failed from yesterday. Can some committer check what happened? > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056391#comment-14056391 ] Hadoop QA commented on YARN-2069: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12654821/YARN-2069-trunk-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4236//console This message is automatically generated. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch, YARN-2069-trunk-5.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056005#comment-14056005 ] Sunil G commented on YARN-2069: --- Yes [~leftnoteasy]. You are correct, that call will drop reservations as needed. Thank you. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055958#comment-14055958 ] Wangda Tan commented on YARN-2069: -- And bq. 4. No more check will happen with preemptFrom logic as resToObtain is 0. {code} + if (Resources.lessThan(rc, clusterResource, userLimitforQueue, + userConsumedResource)) { +// As we have used more resources the user limit, +// we need to claim back the resources equivalent to +// consumed resources by user - user limit +Resource resourcesToClaimBackFromUser = Resources.subtract( +userConsumedResource, userLimitforQueue); {code} I think we need add a check here, resourcesToClaimBackFromUser=min(resourcesToClaimBackFromUser, resToObtain), is it? [~mayank_bansal] > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055956#comment-14055956 ] Wangda Tan commented on YARN-2069: -- Hi [~sunilg], I'm not sure what you meant, I found if there're reserved containers existed in multiple apps within a queue, all of their reservation could be dropped if it is still beyond limit. {code} +Set containers = preemptFrom(fc, clusterResource, +resourcesToClaimBackFromUser, skippedAMContainerlist, +skippedAMSize, null); {code} resourceToClaimBackFromUser will be different if app comes from different user. > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055900#comment-14055900 ] Sunil G commented on YARN-2069: --- Thanks [~mayank_bansal] for the update. I have still a doubt left here. 1. *resToObtain* is identified per queue. 2. *balanceUserLimitsinQueueForPreemption* has found *userLimitContainers* containers and updated *resToObtain* 3. Assume *resToObtain* becomes 0 here. 4. No more check will happen with *preemptFrom* logic as resToObtain is 0. 5. *userLimitContainers* is the only set added to final list and send for preemption. So my doubt here is, assume there are some reserved containers in applications of that queue, then these container will not be selected for preemption at this round. is this intentional? > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055701#comment-14055701 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], I looked at your updated patch, I think the correct logic looks good to me, two comments left: 1) {code} +Map consumedResource = new HashMap(); {code} I think it's better to name it such as "userMarkedPreemptedResource", it should be preempted resource, not consumed according to my understanding. 2) The new added test is good, but I think it's not enough to cover the edge case we mentioned, The edge case should be {code} App1 App2 App3 App4 user1 user1 user2 user2 {code} And the app<->user created in your test is {code} App1 App2 App3 App4 user1 user2 user3 user4 {code} The preempted resource in the previous case should be 24 containers from app2, 24 container from app4, 2 containers from app3. Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch, YARN-2069-trunk-4.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055598#comment-14055598 ] Mayank Bansal commented on YARN-2069: - Hi [~wangda], Good point , I missed it. I have updated the patch accordingly, Please review. Hi [~sunilg] Previously as well we dont wait for one more cycle to happen before we start preemption. it drops reservation and then count against res to Obtain and return rest of the containers to preempt, I am following the same pattern. So essentially I am droping reservation , then try to balance the queue with user limits and then get rest of the containers and then send them to preempt. Thanks, Mayank > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14055191#comment-14055191 ] Sunil G commented on YARN-2069: --- Hi [~mayank_bansal] I have a doubt here. preemptFrom is doing drop reservation. balanceUserLimitsinQueueForPreemption is returning few containers which are crossing userlimits, and assume this is enough to satisfy resToObtain firstly itself. But if any apps are there in same Queue which also has few reservations, earlier design first drop reservations and wait for a cycle. Here in above scenario, preemption will happen first, then drop reservations. Is it needed to be handled in user preemption also? > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054346#comment-14054346 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], Please let me know if I didn't understand correctly, For the following code snippet, {code} 1 Resource userLimitforQueue = qT.leafQueue.computeUserLimit(fc, 2 clusterResource, Resources.none()); 3 if (Resources.lessThan(rc, clusterResource, userLimitforQueue, 4 qT.leafQueue.getUser(fc.getUser()).getConsumedResources())) { 5 6// As we have used more resources the user limit, 7// we need to claim back the resources equivalent to 8// consumed resources by user - user limit 9Resource resourcesToClaimBackFromUser = Resources.subtract(qT.leafQueue 10 .getUser(fc.getUser()).getConsumedResources(), userLimitforQueue); {code} Line 1-4 will check if we need preempt resource from an application, because preemption is a delayed behavior, the "preemptFrom" will not change return value of qT.leafQueue.getUser(fc.getUser()).getConsumedResources(). So for 5 apps of B, all its apps will be preempt line 9-10 containers. Is it correct? > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054339#comment-14054339 ] Mayank Bansal commented on YARN-2069: - [~wangda] bq. Assume a queue has 10 apps, each app has 5 containers (1G for each container, so queue has 50G mem used). There're two apps, each app has 5 apps. User-limit is 15G, queue's absolute capacity is 30G. And first 5 apps belongs to user-A, last 5 apps belongs to user-B. In your correct method, user-B will be preempted 20 containers and user-A will be preempted nothing. After preemption, only 5 container left for user-B, and 25 containers left for user-A. User-limit is respected here. No, if User A has limit 15G Limit then it will be preempted only 15 GB and then B tasks will be prrempted > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2069) CS queue level preemption should respect user-limits
[ https://issues.apache.org/jira/browse/YARN-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054334#comment-14054334 ] Wangda Tan commented on YARN-2069: -- Hi [~mayank_bansal], Thanks for your comments, I think the change of title/description should be correct, this patch is targeted to solve cross-queue preemption should respect user-limit. I think your other comments all make sense to me. Only below one, bq. We need to maintian the reverse order of application submission which only can be done iterating through applications as we want to preempt applications which are last submitted. IMHO, this is reasonable but conflict with this JIRA's scope, let me give you an example. Assume a queue has 10 apps, each app has 5 containers (1G for each container, so queue has 50G mem used). There're two apps, each app has 5 apps. User-limit is 15G, queue's absolute capacity is 30G. And first 5 apps belongs to user-A, last 5 apps belongs to user-B. In your correct method, user-B will be preempted 20 containers and user-A will be preempted nothing. After preemption, only 5 container left for user-B, and 25 containers left for user-A. User-limit is respected here. Does this make sense to you? Thanks, Wangda > CS queue level preemption should respect user-limits > > > Key: YARN-2069 > URL: https://issues.apache.org/jira/browse/YARN-2069 > Project: Hadoop YARN > Issue Type: Sub-task > Components: capacityscheduler >Reporter: Vinod Kumar Vavilapalli >Assignee: Mayank Bansal > Attachments: YARN-2069-trunk-1.patch, YARN-2069-trunk-2.patch, > YARN-2069-trunk-3.patch > > > This is different from (even if related to, and likely share code with) > YARN-2113. > YARN-2113 focuses on making sure that even if queue has its guaranteed > capacity, it's individual users are treated in-line with their limits > irrespective of when they join in. > This JIRA is about respecting user-limits while preempting containers to > balance queue capacities. -- This message was sent by Atlassian JIRA (v6.2#6252)