[jira] [Commented] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler

Wangda Tan (JIRA) Mon, 22 Sep 2014 01:32:55 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143019#comment-14143019
 ]


Wangda Tan commented on YARN-2498:
----------------------------------

Hi [~sunilg],
Many thanks for reviewing this patch, feedbacks:

1)
bq. A scenario where node1 has more than 50% (say 60) of cluster resources, and 
queue A is given 50% in CS. IN that case, is there any chance of under 
utilization?
Yes, queue-A can be under utilization. By design of YARN-796, this is 
acceptable. Now we will calculate realtime maximum resource can be accessed by 
each queue, and user/admin can get warning of queue under utilization from web 
UI - scheduler page.

2)
bq. Here I feel, we may need to split up the resource of label in each node 
level.
It's a very good question, I just thought this for a while again. I found a 
negtive example shows you're right:
{code}
node1: x,y
node2: x,y
node3: z

each node has resource 10,
resource tree: 
         total = 30
        /    |    \
       20x   20y  10z

First request 20 resource with label = x
resource tree: 
         total = 10
        /    |    \
       0x   20y  10z

The correct result should be, y = 0, we cannot request resource with label=y.
{code}
So it's best to split up the resource of label to node level, but the problem 
is, it will have much larger time complexity. For each assign operation, we 
need O(n=#unique-set-of-labels-on-node). It can be very large in a big cluster. 
And considering m=#iteration and p=#leaf-queue, we need O(n * m * p) to get the 
ideal_assigned of each queue.
It may have better way to calculate ideal_assigned, I will think about this. 
For now, it can only get correct ideal_assigned when all node in the cluster 
has <= 1 label. It's the hard-partition use-case (cluster is partitioned to 
several smaller clusters by label).

3)
bq. For preemption, we just calculate to match the totalResourceToPreempt from 
the over utilized queues. But whether this container is from which node, and 
also under which label, and whether this label is coming under which queue. Do 
we need to do this check for each container?
I think the answer is yes if we want: every container preempted can be accessed 
by at least one queue under-satisfied (has ideal_assigned > current).

Please let me know if you have more comments,

Thanks,
Wangda

> [YARN-796] Respect labels in preemption policy of capacity scheduler
> --------------------------------------------------------------------
>
>                 Key: YARN-2498
>                 URL: https://issues.apache.org/jira/browse/YARN-2498
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, 
> yarn-2498-implementation-notes.pdf
>
>
> There're 3 stages in ProportionalCapacityPreemptionPolicy,
> # Recursively calculate {{ideal_assigned}} for queue. This is depends on 
> available resource, resource used/pending in each queue and guaranteed 
> capacity of each queue.
> # Mark to-be preempted containers: For each over-satisfied queue, it will 
> mark some containers will be preempted.
> # Notify scheduler about to-be preempted container.
> We need respect labels in the cluster for both #1 and #2:
> For #1, when there're some resource available in the cluster, we shouldn't 
> assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot 
> access such labels
> For #2, when we make decision about whether we need preempt a container, we 
> need make sure, resource this container is *possibly* usable by a queue which 
> is under-satisfied and has pending resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-2498) [YARN-796] Respect labels in preemption policy of capacity scheduler

Reply via email to