[ https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124195#comment-17124195 ]
Wangda Tan commented on YARN-10293: ----------------------------------- [~Tao Yang], the suggestion totally make sense to me. When we have done the initial global scheduling framework, the goal is to make it compatible to the previous behavior, I agree to make additional steps to overhaul reservation logic under the context of global scheduling is a good idea. Now the code is very hard to read and understand. I think we can do this step by step, first, let's fix low hanging fruits like this Jira. (I hope to get idea from you about the proposed change: https://issues.apache.org/jira/browse/YARN-10293?focusedCommentId=17121419&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17121419 And [~prabhujoseph] if you have time/bandwidth, can you take a look into reservation related logic + preemption + unreserve + global scheduling and see what we can optimize here? > Reserved Containers not allocated from available space of other nodes in > CandidateNodeSet in MultiNodePlacement (YARN-10259) > ---------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-10293 > URL: https://issues.apache.org/jira/browse/YARN-10293 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > Attachments: YARN-10293-001.patch, YARN-10293-002.patch, > YARN-10293-003-WIP.patch > > > Reserved Containers not allocated from available space of other nodes in > CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues > related to it > https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987 > Have found one more bug in the CapacityScheduler.java code which causes the > same issue with slight difference in the repro. > *Repro:* > *Nodes : Available : Used* > Node1 - 8GB, 8vcores - 8GB. 8cores > Node2 - 8GB, 8vcores - 8GB. 8cores > Node3 - 8GB, 8vcores - 8GB. 8cores > Queues -> A and B both 50% capacity, 100% max capacity > MultiNode enabled + Preemption enabled > 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores > 2. JobB Submitted to B queue with AM size of 1GB > {code} > 2020-05-21 12:12:27,313 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest > IP=172.27.160.139 OPERATION=Submit Application Request > TARGET=ClientRMService RESULT=SUCCESS APPID=application_1590046667304_0005 > CALLERCONTEXT=CLI QUEUENAME=dummy > {code} > 3. Preemption happens and used capacity is lesser than 1.0f > {code} > 2020-05-21 12:12:48,222 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics: > Non-AM container preempted, current > appAttemptId=appattempt_1590046667304_0004_000001, > containerId=container_e09_1590046667304_0004_01_000024, > resource=<memory:1024, vCores:1> > {code} > 4. JobB gets a Reserved Container as part of > CapacityScheduler#allocateOrReserveNewContainer > {code} > 2020-05-21 12:12:48,226 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to > RESERVED > 2020-05-21 12:12:48,226 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: > tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 > available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with > resource=<memory:1024, vCores:1> > {code} > *Why RegularContainerAllocator reserved the container when the used capacity > is <= 1.0f ?* > {code} > The reason is even though the container is preempted - nodemanager has to > stop the container and heartbeat and update the available and unallocated > resources to ResourceManager. > {code} > 5. Now, no new allocation happens and reserved container stays at reserved. > After reservation the used capacity becomes 1.0f, below will be in a loop and > no new allocate or reserve happens. The reserved container cannot be > allocated as reserved node does not have space. node2 has space for 1GB, > 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting > called causing the Hang. > *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> > CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container > on node* > {code} > 2020-05-21 12:13:33,242 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Trying to fulfill reservation for application application_1590046667304_0005 > on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 > 2020-05-21 12:13:33,242 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > assignContainers: partition= #applications=1 > 2020-05-21 12:13:33,242 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: > tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 > available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with > resource=<memory:1024, vCores:1> > 2020-05-21 12:13:33,243 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Allocation proposal accepted > {code} > CapacityScheduler#allocateOrReserveNewContainers won't be called as below > check in allocateContainersOnMultiNodes fails > {code} > if (getRootQueue().getQueueCapacities().getUsedCapacity( > candidates.getPartition()) >= 1.0f > && preemptionManager.getKillableResource( > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org