[ 
https://issues.apache.org/jira/browse/YARN-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428904#comment-15428904
 ] 

Eric Payne commented on YARN-4945:
----------------------------------

[~sunilg], thank you so much for providing this design doc and POC. I have not 
yet looked at the patch, but I have a few comments on the design doc.

-----
{quote}
Additional Requirement specs
...
- Over subscribed queue ...
-- Selected containers will completely serve resource need from starving apps.
...
-- Selected containers only partially serves the need
...
By scanning through each partition and its associated queues 
(TempQueuePerPartition), we can understand how much resources are offered from 
each queue for preemption and also the selected container list. This can be 
used as a reference to avoid double calculations in intra­queue preemption 
round.
{quote}
I'm pretty sure that the containers already in the {{selectedCandidates}} list 
will _not_ be re-assigned to anything in the current queue. The containers are 
in that list because some other queue is asking for them. Even if containers 
that are already in the inter-queue preemption list would also help resolve an 
intra-queue preemption problem, those containers will go to the more 
underserved queue before coming back to the current queue. My assertion is that 
regardless of what containers are already in the {{selectedCandidates}} list, 
the intra-queue preemption policy would always need to select more.

-----
{quote}
Configurations and considerations
- Provide a configuration to turn on/off intra­queue preemption along with the 
type of policy it is going to handle (priority, fairness, user­limit etc)
{quote}
Additionally, we may want to consider intra-queue preemption configs for dead 
zone, natural completion, etc. This may even need to be per queue.


-----
{quote}
 Select ideal candidates for intra­queue preemption per priority.
...
3. ‘pending’ resource per partition will be calculated for all the apps and 
together store in a consolidated map (resourceToObtain) of pending resource to 
be collected per partition in one queue.
{quote}
The use of the word "pending" in conjunction with the reference to 
{{resourceToObtain}} is confusing to me. It sounds like "pending" is talking 
about "preemptable resources," but "pending" means "resources requested but not 
yet allocated." (See 
{{LeafQueue#getTotalPendingResourcesConsideringUserLimit}}).

For instance, the {{resToObtainByPartition}} variable in 
{{FifoCandidatesSelector}} is used for holding the amount of extra (and 
therefore preemptable) resources being used by a queue. Is this step 
calculating the total of preemptable resources for apps in this queue, per 
partition?

-----
{quote}
4. While doing this, we will ensure that certains apps will be skipped if it is 
already equal or more that its user­limit quota.  This map will be the entry 
point to select candidates from lower priority apps in next step.
{quote}
Is this saying that, when marking containers for preemption, if an app is under 
its user limit percent, its containers will not be marked? Or, is it saying 
that if an app is asking for more containers and it is already over its user 
limit percent, other apps' containers won't be preempted on its behalf?

Not only do we need to avoid preemptiong resources _for_ users that are over 
their user limit percent, we need to avoid preempting containers _from_ users 
that are under their user limit percent. Even today in the capacity scheuler, 
if I have a queue with a 50% user limit percent, and app1 from user1 is 
priority1 and app2 from user2 is priority2, and they are both asking for more 
resources, user2 will not get more containers until user1 has reached 50% of 
the queue. In other words, user limit percent trumps application priority.

-----
I am concerned that priority-based intra-queue preemption has a different set 
of goals than user limit percent-based intra-queue preemption. For instance,
-  requirements for user limit percent-based preemption are calculated based at 
the user level, while priority-based preemption requirements go down to the app 
level.
- User limit percent-based preemption only makes sense if multiple users are in 
a queue, and priority-based preemption only makes sense if a priority inversion 
can happen between apps of the same user in a queue.

Perhaps these should be totally separate policies. Anyway, for us, user limit 
percent-based preemption is much more important.

> [Umbrella] Capacity Scheduler Preemption Within a queue
> -------------------------------------------------------
>
>                 Key: YARN-4945
>                 URL: https://issues.apache.org/jira/browse/YARN-4945
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Wangda Tan
>         Attachments: IntraQueuepreemption-CapacityScheduler (Design).pdf, 
> YARN-2009-wip.patch
>
>
> This is umbrella ticket to track efforts of preemption within a queue to 
> support features like:
> YARN-2009. YARN-2113. YARN-4781.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to