[ 
https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Yang updated YARN-8771:
---------------------------
    Attachment: YARN-8771.001.patch

> CapacityScheduler fails to unreserve when cluster resource contains empty 
> resource type
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-8771
>                 URL: https://issues.apache.org/jira/browse/YARN-8771
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.2.0
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Critical
>         Attachments: YARN-8771.001.patch
>
>
> We found this problem when cluster is almost but not exhausted (93% used), 
> scheduler kept allocating for an app but always fail to commit, this can 
> blocking requests from other apps and parts of cluster resource can't be used.
> Reproduce this problem:
> (1) use DominantResourceCalculator
> (2) cluster resource has empty resource type, for example: gpu=0
> (3) scheduler allocates container for app1 who has reserved containers and 
> whose queue limit or user limit reached(used + required > limit). 
> Reference codes in RegularContainerAllocator#assignContainer:
> {code:java}
>     boolean needToUnreserve =
>         Resources.greaterThan(rc, clusterResource,
>             resourceNeedToUnReserve, Resources.none());
> {code}
> value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of 
> {{Resources#greaterThan}} will be false if using DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to