[ https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Bacsko reassigned YARN-10848: ----------------------------------- Assignee: Minni Mittal > Vcore allocation problem with DefaultResourceCalculator > ------------------------------------------------------- > > Key: YARN-10848 > URL: https://issues.apache.org/jira/browse/YARN-10848 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler, capacityscheduler > Reporter: Peter Bacsko > Assignee: Minni Mittal > Priority: Major > Attachments: TestTooManyContainers.java > > > If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating > containers even if we run out of vcores. > CS checks the the available resources at two places. The first check is > {{CapacityScheduler.allocateContainerOnSingleNode()}}: > {noformat} > if (calculator.computeAvailableContainers(Resources > .add(node.getUnallocatedResource(), > node.getTotalKillableResources()), > minimumAllocation) <= 0) { > LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient " > + "available or preemptible resource for minimum allocation"); > {noformat} > The second, which is more important, is located in > {{RegularContainerAllocator.assignContainer()}}: > {noformat} > if (!Resources.fitsIn(rc, capability, totalResource)) { > LOG.warn("Node : " + node.getNodeID() > + " does not have sufficient resource for ask : " + pendingAsk > + " node total capability : " + node.getTotalResource()); > // Skip this locality request > ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation( > activitiesManager, node, application, schedulerKey, > ActivityDiagnosticConstant. > NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST > + getResourceDiagnostics(capability, totalResource), > ActivityLevel.NODE); > return ContainerAllocation.LOCALITY_SKIPPED; > } > {noformat} > Here, {{rc}} is the resource calculator instance, the other two values are: > {noformat} > Resource capability = pendingAsk.getPerAllocationResource(); > Resource available = node.getUnallocatedResource(); > {noformat} > There is a repro unit test attatched to this case, which can demonstrate the > problem. The root cause is that we pass the resource calculator to > {{Resource.fitsIn()}}. Instead, we should use an overridden version, just > like in {{FSAppAttempt.assignContainer()}}: > {noformat} > // Can we allocate a container on this node? > if (Resources.fitsIn(capability, available)) { > // Inform the application of the new container for this request > RMContainer allocatedContainer = > allocate(type, node, schedulerKey, pendingAsk, > reservedContainer); > {noformat} > In CS, if we switch to DominantResourceCalculator OR use > {{Resources.fitsIn()}} without the calculator in > {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit > test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org