[ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko updated YARN-10283: -------------------------------- Summary: Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used (was: Capacity Scheduler: starvation occurs if a higher priority queue is full a and node labels are used) > Capacity Scheduler: starvation occurs if a higher priority queue is full and > node labels are used > ------------------------------------------------------------------------------------------------- > > Key: YARN-10283 > URL: https://issues.apache.org/jira/browse/YARN-10283 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Reporter: Peter Bacsko > Assignee: Peter Bacsko > Priority: Major > > Recently we've been investigating a scenario where applications submitted to > a lower priority queue could not get scheduled because a higher priority > queue in the same hierarchy could now satisfy the allocation request. Both > queue belonged to the same partition. > If we disabled node labels, the problem disappeared. > The problem is that {{RegularContainerAllocator}} always allocated a > container for the request, even if it should not have. > *Example:* > * Cluster total resources: 3 nodes, 15GB, 24 vcores > * Partition "shared" was created with 2 nodes > * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were > added to the partition > * Both queues have a limit of <memory:5120, vCores:8> > * Using DominantResourceCalculator > Setup: > Submit distributed shell application to highprio with switches > "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per > container. > Chain of events: > 1. Queue is filled with contaners until it reaches usage <memory:2560, > vCores:5> > 2. A node update event is pushed to CS from a node which is part of the > partition > 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller > than the current limit resource <memory:5120, vCores:8> > 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an > allocated container for <memory:512, vcores:4> > 4. But we can't commit the resource request because we would have 9 vcores in > total, violating the limit. > The problem is that we always try to assign container for the same > application in each heartbeat from "highprio". Applications in "lowprio" > cannot make progress. > *Problem:* > {{RegularContainerAllocator.assignContainer()}} does not handle this case > well. We only reject allocation if this condition is satisfied: > {noformat} > if (rmContainer == null && reservationsContinueLooking > && node.getLabels().isEmpty()) { > {noformat} > But if we have node labels, we succeed with the allocation if there's room > for a container. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org