[ 
https://issues.apache.org/jira/browse/YARN-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623035#comment-16623035
 ] 

Tao Yang edited comment on YARN-8804 at 9/21/18 4:05 AM:
---------------------------------------------------------

Thanks [~jlowe],[~leftnoteasy] for your review and reply. 
For the volatile keyword, it is a mistake when I copied from headroom field in 
ResourceLimits. I should removed it after did that. The resourceLimits in 
scheduling process is thread-safe because it isn't shared by multiple 
scheduling threads. Every scheduling thread will create a ResourceLimits 
instance at the beginning of scheduling process in 
CapacityScheduler#allocateOrReserveNewContainers or 
CapacityScheduler#allocateContainerOnSingleNode then pass it on.
{quote}
I think it would be cleaner if a queue could return an assignment result that 
not only indicated the allocation was skipped due to queue limits but also how 
much needs to be reserved as a result of that skipped assignment.
{quote}
Now we can get reserved resource from \{{childLimits.getHeadroom()}} for leaf 
queue, then add it into the blockedHeadroom of leaf/parent queue, so that later 
queues can get correct net limits through {{limit - blockedHeadroom}}. I think 
it's enough to solve this problem. Thoughts?
{quote}
The result would be less overhead for the normal scheduler loop, as we would 
only be adjusting when necessary rather than every time.
{quote}
Thanks for this mention. I will improve the calculation to avoid doing it every 
time through adding ResourceLimits#getNetLimit, this method will do calculation 
when necessary rather than every time.
{quote}
>From my analysis of YARN-8513, scheduler tries to allocate containers to queue 
>when it will go beyond max capacity (used + allocating > max). But resource 
>committer will reject such proposal.
{quote}
YARN-8513 is not the same problem with this issue as the former comments from 
[~jlowe], it seems similar to YARN-8771 which may be caused by wrong 
calculation when needUnreservedResource with empty resource type in 
RegularContainerAllocator#assignContainer. But I am not sure they are the same 
problem.


was (Author: tao yang):
Thanks [~jlowe],[~leftnoteasy] for your review and reply. 
For the volatile keyword, it is a mistake when I copied from headroom field in 
ResourceLimits. I should removed it after did that. The resourceLimits in 
scheduling process is thread-safe because it isn't shared by multiple 
scheduling threads. Every scheduling thread will create a ResourceLimits 
instance at the beginning of scheduling process in 
CapacityScheduler#allocateOrReserveNewContainers or 
CapacityScheduler#allocateContainerOnSingleNode then pass it on. 
{quote}
I think it would be cleaner if a queue could return an assignment result that 
not only indicated the allocation was skipped due to queue limits but also how 
much needs to be reserved as a result of that skipped assignment.
{quote}
Now we can get reserved resource from \{{childLimits.getHeadroom()}} for leaf 
queue, then add it into the blockedHeadroom of leaf/parent queue, so that later 
queues can get correct net limits through {{limit - blockedHeadroom}}.
{quote}
The result would be less overhead for the normal scheduler loop, as we would 
only be adjusting when necessary rather than every time.
{quote}
Thanks for this mention. I will improve the calculation to avoid doing it every 
time through adding ResourceLimits#getNetLimit, this method will do calculation 
when necessary rather than every time.
{quote}
>From my analysis of YARN-8513, scheduler tries to allocate containers to queue 
>when it will go beyond max capacity (used + allocating > max). But resource 
>committer will reject such proposal.
{quote}
YARN-8513 is not the same problem with this issue as the former comments from 
[~jlowe], it seems similar to YARN-8771 which may be caused by wrong 
calculation when needUnreservedResource with empty resource type in 
RegularContainerAllocator#assignContainer. But I am not sure they are the same 
problem.

> resourceLimits may be wrongly calculated when leaf-queue is blocked in 
> cluster with 3+ level queues
> ---------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8804
>                 URL: https://issues.apache.org/jira/browse/YARN-8804
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.2.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-8804.001.patch
>
>
> This problem is due to YARN-4280, parent queue will deduct child queue's 
> headroom when the child queue reached its resource limit and the skipped type 
> is QUEUE_LIMIT, the resource limits of deepest parent queue will be correctly 
> calculated, but for non-deepest parent queue, its headroom may be much more 
> than the sum of reached-limit child queues' headroom, so that the resource 
> limit of non-deepest parent may be much less than its true value and block 
> the allocation for later queues.
> To reproduce this problem with UT:
>  (1) Cluster has two nodes whose node resource both are <10GB, 10core> and 
> 3-level queues as below, among them max-capacity of "c1" is 10 and others are 
> all 100, so that max-capacity of queue "c1" is <2GB, 2core>
> {noformat}
>                   Root
>                  /  |  \
>                 a   b    c
>                10   20   70
>                          |   \
>                         c1   c2
>                   10(max=10) 90
> {noformat}
> (2) Submit app1 to queue "c1" and launch am1(resource=<1GB, 1 core>) on nm1
>  (3) Submit app2 to queue "b" and launch am2(resource=<1GB, 1 core>) on nm1
>  (4) app1 and app2 both ask one <2GB, 1core> containers. 
>  (5) nm1 do 1 heartbeat
>  Now queue "c" has lower capacity percentage than queue "b", the allocation 
> sequence will be "a" -> "c" -> "b",
>  queue "c1" has reached queue limit so that requests of app1 should be 
> pending, 
>  headroom of queue "c1" is <1GB, 1core> (=max-capacity - used), 
>  headroom of queue "c" is <18GB, 18core> (=max-capacity - used), 
>  after allocation for queue "c", resource limit of queue "b" will be wrongly 
> calculated as <2GB, 2core>,
>  headroom of queue "b" will be <1GB, 1core> (=resource-limit - used)
>  so that scheduler won't allocate one container for app2 on nm1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to