[ 
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bo Li updated YARN-11082:
-------------------------
    Description: 
We ued cluster resource as denominator to decide which resoure is dominated in 
AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed 
differently.
{quote}2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1637412555366_1588993_000001 
container=null queue=root.a.a1.a2 clusterResource=<memory:175117312, 
vCores:40222> type=RACK_LOCAL requestedPartition=xx
2021-12-09 10:24:37,069 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
 Used resource=<memory:3381248, vCores:687> exceeded maxResourceLimit of the 
queue =<memory:3420315, vCores:687>

2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal
{quote}
We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
following code in AbstrctQueue#canAssignToThisQueue still return false

{quote}
Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
usedExceptKillable, currentLimitResource)
{quote}

clusterResource = <memory:175117312, vCores:40222>
usedExceptKillable = <memory:3381248, vCores:687> 
currentLimitResource = <memory:3420315, vCores:687>

currentLimitResource:
memory : 3381248/175117312 = 0.01930847362
vCores : 687/40222 = 0.01708020486

usedExceptKillable:
memory : 3384320/175117312 = 0.01932601615
vCores : 688/40222 = 0.01710506687

DRF will think memory is dominated resource and return false in this scenario

  was:
We ued cluster resource as denominator to decide which resoure is dominated in 
AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are configed 
differently.
{quote}
2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
 assignedContainer application attempt=appattempt_1637412555366_1588993_000001 
container=null queue=root.a.a1.a2 clusterResource=<memory:175117312, 
vCores:40222> type=RACK_LOCAL requestedPartition=xx
2021-12-09 10:24:37,069 DEBUG 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
 Used resource=<memory:3381248, vCores:687> exceeded maxResourceLimit of the 
queue =<memory:3420315, vCores:687>

2021-12-09 10:24:37,069 INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
 Failed to accept allocation proposal

{quote}

We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
following code in AbstrctQueue#canAssignToThisQueue still return false
```java
Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
          usedExceptKillable, currentLimitResource)
```
 clusterResource = <memory:175117312, vCores:40222>
usedExceptKillable = <memory:3381248, vCores:687> 
currentLimitResource = <memory:3420315, vCores:687>

currentLimitResource:
memory : 3381248/175117312 = 0.01930847362
vCores : 687/40222 = 0.01708020486

usedExceptKillable:
memory : 3384320/175117312 = 0.01932601615
vCores : 688/40222 = 0.01710506687

DRF will think memory is dominated resource and return false in this scenario


> Use node label reosurce as  denominator to decide which resource is dominated
> -----------------------------------------------------------------------------
>
>                 Key: YARN-11082
>                 URL: https://issues.apache.org/jira/browse/YARN-11082
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.1.1
>            Reporter: Bo Li
>            Priority: Major
>
> We ued cluster resource as denominator to decide which resoure is dominated 
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are 
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
>  assignedContainer application 
> attempt=appattempt_1637412555366_1588993_000001 container=null 
> queue=root.a.a1.a2 clusterResource=<memory:175117312, vCores:40222> 
> type=RACK_LOCAL requestedPartition=xx
> 2021-12-09 10:24:37,069 DEBUG 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
>  Used resource=<memory:3381248, vCores:687> exceeded maxResourceLimit of the 
> queue =<memory:3420315, vCores:687>
> 2021-12-09 10:24:37,069 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>  Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the 
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}
> Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = <memory:175117312, vCores:40222>
> usedExceptKillable = <memory:3381248, vCores:687> 
> currentLimitResource = <memory:3420315, vCores:687>
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to