[jira] [Commented] (YARN-10530) CapacityScheduler ResourceLimits doesn't handle node partition well

2020-12-11 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248087#comment-17248087
 ] 

Wangda Tan commented on YARN-10530:
---

I haven't written any UT yet, but I just want to file the ticket to make sure 
we take a closer look because the logic looks confusing. I will be delighted if 
this is a false alarm :) 

> CapacityScheduler ResourceLimits doesn't handle node partition well
> ---
>
> Key: YARN-10530
> URL: https://issues.apache.org/jira/browse/YARN-10530
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Wangda Tan
>Priority: Blocker
>
> This is a serious bug may impact all releases, I need to do further check but 
> I want to log the JIRA so we will not forget:  
> ResourceLimits objects are used to handle two purposes: 
> 1) When there's cluster resource change, for example adding new node, or 
> scheduler config reinitialize. We will pass ResourceLimits to 
> updateClusterResource to queues. 
> 2) When allocate container, we try to pass parent's available resource to 
> child to make sure child's resource allocation won't violate parent's max 
> resource. For example below: 
> {code}
> queue used  max
> --
> root  1020
> root.a8 10
> root.a.a1 2 10
> root.a.a2 6 10
> {code}
> Even though a.a1 has 8 resources headroom (a1.max - a1.used). But we can at 
> most allocate 2 resources to a1 because root.a's limit will hit first. This 
> information will be passed down from parent queue to child queue during 
> assignContainers call via ResourceLimits. 
> However, we only pass 1 ResourceLimits from top, for queue initialize, we 
> passed in: 
> {code}
> root.updateClusterResource(clusterResource, new ResourceLimits(
> clusterResource));
> {code}
> And when we update cluster resource, we only considered default partition
> {code}
>   // Update all children
>   for (CSQueue childQueue : childQueues) {
> // Get ResourceLimits of child queue before assign containers
> ResourceLimits childLimits = getResourceLimitsOfChild(childQueue,
> clusterResource, resourceLimits,
> RMNodeLabelsManager.NO_LABEL, false);
> childQueue.updateClusterResource(clusterResource, childLimits);
>   }
> {code}
> Same for allocation logic, we passed in: (Actually I found I added a TODO 
> item 5 years ago).
> {code}
> // Try to use NON_EXCLUSIVE
> assignment = getRootQueue().assignContainers(getClusterResource(),
> candidates,
> // TODO, now we only consider limits for parent for non-labeled
> // resources, should consider labeled resources as well.
> new ResourceLimits(labelManager
> .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
> getClusterResource())),
> SchedulingMode.IGNORE_PARTITION_EXCLUSIVITY);
> {code} 
> The good thing is, in the assignContainers call, we calculated child limit 
> based on partition
> {code} 
> ResourceLimits childLimits =
>   getResourceLimitsOfChild(childQueue, cluster, limits,
>   candidates.getPartition(), true);
> {code} 
> So I think now the problem is, when a named partition has more resource than 
> default partition, effective min/max resource of each queue could be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10530) CapacityScheduler ResourceLimits doesn't handle node partition well

2020-12-11 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248084#comment-17248084
 ] 

Wangda Tan commented on YARN-10530:
---

cc: [~sunilg], [~epayne]

> CapacityScheduler ResourceLimits doesn't handle node partition well
> ---
>
> Key: YARN-10530
> URL: https://issues.apache.org/jira/browse/YARN-10530
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, capacityscheduler
>Reporter: Wangda Tan
>Priority: Blocker
>
> This is a serious bug may impact all releases, I need to do further check but 
> I want to log the JIRA so we will not forget:  
> ResourceLimits objects are used to handle two purposes: 
> 1) When there's cluster resource change, for example adding new node, or 
> scheduler config reinitialize. We will pass ResourceLimits to 
> updateClusterResource to queues. 
> 2) When allocate container, we try to pass parent's available resource to 
> child to make sure child's resource allocation won't violate parent's max 
> resource. For example below: 
> {code}
> queue used  max
> --
> root  1020
> root.a8 10
> root.a.a1 2 10
> root.a.a2 6 10
> {code}
> Even though a.a1 has 8 resources headroom (a1.max - a1.used). But we can at 
> most allocate 2 resources to a1 because root.a's limit will hit first. This 
> information will be passed down from parent queue to child queue during 
> assignContainers call via ResourceLimits. 
> However, we only pass 1 ResourceLimits from top, for queue initialize, we 
> passed in: 
> {code}
> root.updateClusterResource(clusterResource, new ResourceLimits(
> clusterResource));
> {code}
> And when we update cluster resource, we only considered default partition
> {code}
>   // Update all children
>   for (CSQueue childQueue : childQueues) {
> // Get ResourceLimits of child queue before assign containers
> ResourceLimits childLimits = getResourceLimitsOfChild(childQueue,
> clusterResource, resourceLimits,
> RMNodeLabelsManager.NO_LABEL, false);
> childQueue.updateClusterResource(clusterResource, childLimits);
>   }
> {code}
> Same for allocation logic, we passed in: (Actually I found I added a TODO 
> item 5 years ago).
> {code}
> // Try to use NON_EXCLUSIVE
> assignment = getRootQueue().assignContainers(getClusterResource(),
> candidates,
> // TODO, now we only consider limits for parent for non-labeled
> // resources, should consider labeled resources as well.
> new ResourceLimits(labelManager
> .getResourceByLabel(RMNodeLabelsManager.NO_LABEL,
> getClusterResource())),
> SchedulingMode.IGNORE_PARTITION_EXCLUSIVITY);
> {code} 
> The good thing is, in the assignContainers call, we calculated child limit 
> based on partition
> {code} 
> ResourceLimits childLimits =
>   getResourceLimitsOfChild(childQueue, cluster, limits,
>   candidates.getPartition(), true);
> {code} 
> So I think now the problem is, when a named partition has more resource than 
> default partition, effective min/max resource of each queue could be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org