[ https://issues.apache.org/jira/browse/YARN-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143019#comment-14143019 ]
Wangda Tan commented on YARN-2498: ---------------------------------- Hi [~sunilg], Many thanks for reviewing this patch, feedbacks: 1) bq. A scenario where node1 has more than 50% (say 60) of cluster resources, and queue A is given 50% in CS. IN that case, is there any chance of under utilization? Yes, queue-A can be under utilization. By design of YARN-796, this is acceptable. Now we will calculate realtime maximum resource can be accessed by each queue, and user/admin can get warning of queue under utilization from web UI - scheduler page. 2) bq. Here I feel, we may need to split up the resource of label in each node level. It's a very good question, I just thought this for a while again. I found a negtive example shows you're right: {code} node1: x,y node2: x,y node3: z each node has resource 10, resource tree: total = 30 / | \ 20x 20y 10z First request 20 resource with label = x resource tree: total = 10 / | \ 0x 20y 10z The correct result should be, y = 0, we cannot request resource with label=y. {code} So it's best to split up the resource of label to node level, but the problem is, it will have much larger time complexity. For each assign operation, we need O(n=#unique-set-of-labels-on-node). It can be very large in a big cluster. And considering m=#iteration and p=#leaf-queue, we need O(n * m * p) to get the ideal_assigned of each queue. It may have better way to calculate ideal_assigned, I will think about this. For now, it can only get correct ideal_assigned when all node in the cluster has <= 1 label. It's the hard-partition use-case (cluster is partitioned to several smaller clusters by label). 3) bq. For preemption, we just calculate to match the totalResourceToPreempt from the over utilized queues. But whether this container is from which node, and also under which label, and whether this label is coming under which queue. Do we need to do this check for each container? I think the answer is yes if we want: every container preempted can be accessed by at least one queue under-satisfied (has ideal_assigned > current). Please let me know if you have more comments, Thanks, Wangda > [YARN-796] Respect labels in preemption policy of capacity scheduler > -------------------------------------------------------------------- > > Key: YARN-2498 > URL: https://issues.apache.org/jira/browse/YARN-2498 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: YARN-2498.patch, YARN-2498.patch, YARN-2498.patch, > yarn-2498-implementation-notes.pdf > > > There're 3 stages in ProportionalCapacityPreemptionPolicy, > # Recursively calculate {{ideal_assigned}} for queue. This is depends on > available resource, resource used/pending in each queue and guaranteed > capacity of each queue. > # Mark to-be preempted containers: For each over-satisfied queue, it will > mark some containers will be preempted. > # Notify scheduler about to-be preempted container. > We need respect labels in the cluster for both #1 and #2: > For #1, when there're some resource available in the cluster, we shouldn't > assign it to a queue (by increasing {{ideal_assigned}}) if the queue cannot > access such labels > For #2, when we make decision about whether we need preempt a container, we > need make sure, resource this container is *possibly* usable by a queue which > is under-satisfied and has pending resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)