Tao Yang created YARN-8771:
------------------------------

             Summary: CapacityScheduler fails to unreserve when cluster 
resource contains empty resource type
                 Key: YARN-8771
                 URL: https://issues.apache.org/jira/browse/YARN-8771
             Project: Hadoop YARN
          Issue Type: Bug
          Components: capacityscheduler
    Affects Versions: 3.2.0
            Reporter: Tao Yang
            Assignee: Tao Yang


We found this problem when cluster is almost but not exhausted (93% used), 
scheduler kept allocating for an app but always fail to commit, this can 
blocking requests from other apps and parts of cluster resource can't be used.

Reproduce this problem:
(1) use DominantResourceCalculator
(2) cluster resource has empty resource type, for example: gpu=0
(3) scheduler allocates container for app1 who has reserved containers and 
whose queue limit or user limit reached(used + required > limit). 

Reference codes in RegularContainerAllocator#assignContainer:
{code:java}
    boolean needToUnreserve =
        Resources.greaterThan(rc, clusterResource,
            resourceNeedToUnReserve, Resources.none());
{code}
value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of 
{{Resources#greaterThan}} will be false if using DominantResourceCalculator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to