[ 
https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388046#comment-17388046
 ] 

Minni Mittal edited comment on YARN-10848 at 7/27/21, 1:10 PM:
---------------------------------------------------------------

[~pbacsko], As per my understanding, DefaultResourceCalculator considers memory 
as the limiting resource.
{code:java}
private static final Set<String> INSUFFICIENT_RESOURCE_NAME =
    ImmutableSet.of(ResourceInformation.MEMORY_URI);
{code}
 As such, it will keep on allocating containers till we have memory available 
irrespective of the availability of the vcores.

In the test "TestTooManyContainers" you added, if we increase 
numRequestedContainers to 13, then it will allocate 11 containers and then will 
have
{code:java}
 This node 127.0.0.1:1234 doesn't have sufficient available or preemptible 
resource for minimum allocation
{code}
This looks like expected behavior to me.

Please help me with understanding the issue. 


was (Author: minni31):
[~pbacsko], As per my understanding, DefaultResourceCalculator considers memory 
as the limiting resource.

 
{code:java}
private static final Set<String> INSUFFICIENT_RESOURCE_NAME =
    ImmutableSet.of(ResourceInformation.MEMORY_URI);
{code}
 

As such, it will keep on allocating containers till we have memory available 
irrespective of the availability of the vcores.

In the test "TestTooManyContainers" ypu added, if we increase 
numRequestedContainers to 13, then it will allocate 11 containers and then will 
have
{code:java}
 This node 127.0.0.1:1234 doesn't have sufficient available or preemptible 
resource for minimum allocation
{code}
This looks like expected behavior to me.

Please help me with understanding the issue. 

> Vcore allocation problem with DefaultResourceCalculator
> -------------------------------------------------------
>
>                 Key: YARN-10848
>                 URL: https://issues.apache.org/jira/browse/YARN-10848
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>            Reporter: Peter Bacsko
>            Assignee: Minni Mittal
>            Priority: Major
>         Attachments: TestTooManyContainers.java
>
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating 
> containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is 
> {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
>     if (calculator.computeAvailableContainers(Resources
>             .add(node.getUnallocatedResource(), 
> node.getTotalKillableResources()),
>         minimumAllocation) <= 0) {
>       LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
>           + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in 
> {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
>     if (!Resources.fitsIn(rc, capability, totalResource)) {
>       LOG.warn("Node : " + node.getNodeID()
>           + " does not have sufficient resource for ask : " + pendingAsk
>           + " node total capability : " + node.getTotalResource());
>       // Skip this locality request
>       ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
>           activitiesManager, node, application, schedulerKey,
>           ActivityDiagnosticConstant.
>               NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
>               + getResourceDiagnostics(capability, totalResource),
>           ActivityLevel.NODE);
>       return ContainerAllocation.LOCALITY_SKIPPED;
>     }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
>     Resource capability = pendingAsk.getPerAllocationResource();
>     Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the 
> problem. The root cause is that we pass the resource calculator to 
> {{Resource.fitsIn()}}. Instead, we should use an overridden version, just 
> like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
>    // Can we allocate a container on this node?
>     if (Resources.fitsIn(capability, available)) {
>       // Inform the application of the new container for this request
>       RMContainer allocatedContainer =
>           allocate(type, node, schedulerKey, pendingAsk,
>               reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use 
> {{Resources.fitsIn()}} without the calculator in 
> {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit 
> test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to