[ https://issues.apache.org/jira/browse/YARN-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-8771: --------------------------- Attachment: YARN-8771.002.patch > CapacityScheduler fails to unreserve when cluster resource contains empty > resource type > --------------------------------------------------------------------------------------- > > Key: YARN-8771 > URL: https://issues.apache.org/jira/browse/YARN-8771 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler > Affects Versions: 3.2.0 > Reporter: Tao Yang > Assignee: Tao Yang > Priority: Critical > Attachments: YARN-8771.001.patch, YARN-8771.002.patch > > > We found this problem when cluster is almost but not exhausted (93% used), > scheduler kept allocating for an app but always fail to commit, this can > blocking requests from other apps and parts of cluster resource can't be used. > Reproduce this problem: > (1) use DominantResourceCalculator > (2) cluster resource has empty resource type, for example: gpu=0 > (3) scheduler allocates container for app1 who has reserved containers and > whose queue limit or user limit reached(used + required > limit). > Reference codes in RegularContainerAllocator#assignContainer: > {code:java} > boolean needToUnreserve = > Resources.greaterThan(rc, clusterResource, > resourceNeedToUnReserve, Resources.none()); > {code} > value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of > {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org