[jira] [Comment Edited] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-27 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525676#comment-16525676
 ] 

Zian Chen edited comment on YARN-8379 at 6/27/18 10:15 PM:
---

Quickly checked the failed UTs, 
TestAMRestart.testPreemptedAMRestartOnRMRestart[FAIR], passed in local 
environment, don't get timeout failure. 


was (Author: zian chen):
Quickly checked the failed UTs, 
TestAMRestart.testPreemptedAMRestartOnRMRestart[FAIR], passed in local 
environment, don't get timeout failure. 

 

 

 

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Wangda Tan
>Assignee: Zian Chen
>Priority: Major
> Attachments: YARN-8379.001.patch, YARN-8379.002.patch, 
> YARN-8379.003.patch, YARN-8379.004.patch, YARN-8379.005.patch, 
> YARN-8379.006.patch, ericpayne.confs.tgz
>
>
> Existing capacity scheduler only supports preemption for an underutilized 
> queue to reach its guaranteed resource. In addition to that, there’s an 
> requirement to get better balance between queues when all of them reach 
> guaranteed resource but with different fairness resource.
> An example is, 3 queues with capacity, queue_a = 30%, queue_b = 30%, queue_c 
> = 40%. At time T. queue_a is using 30%, queue_b is using 70%. Existing 
> scheduler preemption won't happen. But this is unfair to queue_a since 
> queue_a has the same guaranteed resources.
> Before YARN-5864, capacity scheduler do additional preemption to balance 
> queues. We changed the logic since it could preempt too many containers 
> between queues when all queues are satisfied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8379) Add an option to allow Capacity Scheduler preemption to balance satisfied queues

2018-06-26 Thread Zian Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16524608#comment-16524608
 ] 

Zian Chen edited comment on YARN-8379 at 6/27/18 5:52 AM:
--

Hi [~leftnoteasy]. thanks for the comments.
 # This test case was intend to demonstrate selected candidates will be 
actually killed after custom timeout was reached. This part of code is the 
intention.
{code:java}
editPolicy.editSchedule();
Assert.assertEquals(4, editPolicy.getToPreemptContainers().size());

// check live containers immediately, nothing happen
Assert.assertEquals(39, schedulerApp1.getLiveContainers().size());

Thread.sleep(20*1000);
// Call editSchedule again: selected containers are killed
editPolicy.editSchedule();
waitNumberOfLiveContainersFromApp(schedulerApp1, 35);{code}

 # Totally understand your suggestion here, actually I was define a private 
member variable inside interface PreemptionCandidatesSelector so that we don't 
need to pass curCandidates for every selector instance. However I found 
TestCapacitySchedulerSurgicalPreemption#testPriorityPreemptionFromHighestPriorityQueueAndOldestContainer()
 get failed which helped me realize that curCandidates is a "per round 
reinitialize thing" rather than a "per selector" thing, we should give a clean 
curCandidates HashMap every time we call editSchedule, otherwise like this UT, 
we call editschedule multiple times but the selector remain the same instance, 
then curCandidates will contains old selected containers the second time we 
call the editSchedule, we could definitely make a method inside 
PreemptionCandidatesSelector, and call it explicitly to reset curCandidates per 
round, but this way it makes the code even harder to read. Any better 
suggestions here?
 # Your suggestion is not updating toPreempt on the fly but only update 
curCandidates, then after each round, update toPreempt right after 
selectCandidates is finished? If this is the correct understanding, I think 
from time complexity point of view, these two strategy should be same since we 
always need to go through each temp selected candidates and add it into 
toPreempt with duplicate check, but it helps with maintaining status of 
selected containers in one place(curCandidates) and avoid inconsistency. We can 
further discuss this part.

I'll address the asflicense issue with the next patch. Thanks!


was (Author: zian chen):
Hi [~leftnoteasy]. thanks for the comments.
 # This test case was intend to demonstrate selected candidates will be 
actually killed after custom timeout was reached. This part of code is the 
intention.
{code:java}
editPolicy.editSchedule();
Assert.assertEquals(4, editPolicy.getToPreemptContainers().size());

// check live containers immediately, nothing happen
Assert.assertEquals(39, schedulerApp1.getLiveContainers().size());

Thread.sleep(20*1000);
// Call editSchedule again: selected containers are killed
editPolicy.editSchedule();
waitNumberOfLiveContainersFromApp(schedulerApp1, 35);{code}

 # Totally understand your suggestion here, actually I was define a private 
member variable inside interface PreemptionCandidatesSelector so that we don't 
need to pass curCandidates for every selector instance. However I found 
TestCapacitySchedulerSurgicalPreemption#testPriorityPreemptionFromHighestPriorityQueueAndOldestContainer()
 get failed which helped me realize that curCandidates is a "per round 
reinitialize thing" rather than a "per selector" thing, we should give a clean 
curCandidates HashMap every time we call editSchedule, otherwise like this UT, 
we call editschedule multiple times but the selector remain the same instance, 
then curCandidates will contains old selected containers the second time we 
call the editSchedule, we could definitely make a method inside 
PreemptionCandidatesSelector, and call it explicitly to reset curCandidates per 
round, but this way it makes the code even harder to read. Any better 
suggestions here?
 # Your suggestion is not updating toPreempt on the fly but only update 
curCandidates, then after each round, update toPreempt right after 
selectCandidates is finished? If this is the correct understanding, I think 
from time complexity point of view, these two strategy should be same since we 
always need to go through each temp selected candidates and add it into 
toPreempt with duplicate check, but it helps with maintaining status of 
selected containers in one place(curCandidates) and avoid inconsistency. We can 
further discuss this part.

I'll address the asflicense issue with the next patch. Thanks!

 

 

 

> Add an option to allow Capacity Scheduler preemption to balance satisfied 
> queues
> 
>
> Key: YARN-8379
> URL: https://issues.apache.org/jira/browse/YARN-8379
> Project: Hadoop YARN
>