[
https://issues.apache.org/jira/browse/YARN-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18038502#comment-18038502
]
ASF GitHub Bot commented on YARN-11082:
---------------------------------------
github-actions[bot] commented on PR #4043:
URL: https://github.com/apache/hadoop/pull/4043#issuecomment-3535145530
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Use node label reosurce as denominator to decide which resource is dominated
> -----------------------------------------------------------------------------
>
> Key: YARN-11082
> URL: https://issues.apache.org/jira/browse/YARN-11082
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.1.1
> Reporter: Bo Li
> Priority: Major
> Labels: pull-request-available
> Attachments: YARN-11082.001.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> We ued cluster resource as denominator to decide which resoure is dominated
> in AbstrctQueue#canAssignToThisQueue. Howere nodes in our cluster are
> configed differently.
> {quote}2021-12-09 10:24:37,069 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.AbstractContainerAllocator:
> assignedContainer application
> attempt=appattempt_1637412555366_1588993_000001 container=null
> queue=root.a.a1.a2 clusterResource=<memory:175117312, vCores:40222>
> type=RACK_LOCAL requestedPartition=x
> 2021-12-09 10:24:37,069 DEBUG
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue:
> Used resource=<memory:3381248, vCores:687> exceeded maxResourceLimit of the
> queue =<memory:3420315, vCores:687>
> 2021-12-09 10:24:37,069 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
> Failed to accept allocation proposal
> {quote}
> We can find out that even thouth root.a.a1.a2 used 687/687 vcores, but the
> following code in AbstrctQueue#canAssignToThisQueue still return false
> {quote}Resources.greaterThanOrEqual(resourceCalculator, clusterResource,
> usedExceptKillable, currentLimitResource)
> {quote}
> clusterResource = <memory:175117312, vCores:40222>
> usedExceptKillable = <memory:3381248, vCores:687>
> currentLimitResource = <memory:3420315, vCores:687>
> currentLimitResource:
> memory : 3381248/175117312 = 0.01930847362
> vCores : 687/40222 = 0.01708020486
> usedExceptKillable:
> memory : 3384320/175117312 = 0.01932601615
> vCores : 688/40222 = 0.01710506687
> DRF will think memory is dominated resource and return false in this scenario
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]