Brian Goerlitz created YARN-11428: ------------------------------------- Summary: FairScheduler: Expected preemption may not happen if node has enough free resources Key: YARN-11428 URL: https://issues.apache.org/jira/browse/YARN-11428 Project: Hadoop YARN Issue Type: Bug Reporter: Brian Goerlitz
An application can be FairShare starved in the following conditions: * intra-queue preemption is needed in order for a new application to receive resources * The first NodeManager checked for preemption already has idle resources greater than the required resources * Containers belonging to a different queue that is using no more than its fair share are running on that node Illustration using a single node cluster for simplicity {noformat} yarn.nodemanager.resource.memory-mb = 9216 yarn.nodemanager.resource.cpu-vcores = 18 yarn.scheduler.fair.preemption = true yarn.scheduler.fair.preemption.cluster-utilization-threshold = 0.5 {noformat} FairScheduler config {code:java} <allocations> ... <queue name="default"> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> </queue> <queue name="limited"> <maxResources>memory-mb=33.0%, vcores=33.0%</maxResources> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> </queue> <defaultFairSharePreemptionTimeout>5</defaultFairSharePreemptionTimeout> <defaultFairSharePreemptionThreshold>1.0</defaultFairSharePreemptionThreshold> <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy> ... </allocations> {code} Procedure: # Launch an application (app1) in root.limited which will consume the max resources # Launch an application (app2) in root.default which will consume no more than the queue's fair share # Launch another application (app3) in root.limited with container size smaller than the remaining cluster capacity Expected result: Resources from app1 should be preempted and provided to app3 until app3 has its fair share. In actuality, this does not always happen. When {{FSPreemptionThread}} iterates over the containers on the node, if the first container belongs to app2, it will not be eligible for preemption (as app2 would go below its fair share). Because the node already had enough capacity for the new container, the next container in the list is not checked and an empty {{PreemptableContainers}} is returned. The list contains no AM containers, so in a multinode scenario, no other nodes will be checked either. No container will be preempted, and until the usage scenario changes, app3 is unable to obtain its fair share of resources. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org