Tao Yang created YARN-8771: ------------------------------ Summary: CapacityScheduler fails to unreserve when cluster resource contains empty resource type Key: YARN-8771 URL: https://issues.apache.org/jira/browse/YARN-8771 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.2.0 Reporter: Tao Yang Assignee: Tao Yang
We found this problem when cluster is almost but not exhausted (93% used), scheduler kept allocating for an app but always fail to commit, this can blocking requests from other apps and parts of cluster resource can't be used. Reproduce this problem: (1) use DominantResourceCalculator (2) cluster resource has empty resource type, for example: gpu=0 (3) scheduler allocates container for app1 who has reserved containers and whose queue limit or user limit reached(used + required > limit). Reference codes in RegularContainerAllocator#assignContainer: {code:java} boolean needToUnreserve = Resources.greaterThan(rc, clusterResource, resourceNeedToUnReserve, Resources.none()); {code} value of resourceNeedToUnReserve can be <8GB, -6 cores, 0 gpu>, result of {{Resources#greaterThan}} will be false if using DominantResourceCalculator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org