[ https://issues.apache.org/jira/browse/YARN-10293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126407#comment-17126407 ]
Tao Yang commented on YARN-10293: --------------------------------- Thanks [~prabhujoseph] for this effort. I'm fine, please go ahead. {quote} Yes sure, YARN-9598 addresses many other issues. Will check how to contribute to the same and address any other optimization required. {quote} Good to hear that, Thanks. For the patch, overall it looks good, some suggestions about the UT: * In TestCapacitySchedulerMultiNodes#testExcessReservationWillBeUnreserved, this patch changes the behavior of second-to-last allocation and make last allocation unnecessary, can you remove line 261 to line 267 to make it more clear? {code} Assert.assertEquals(1, schedulerApp1.getLiveContainers().size()); Assert.assertEquals(0, schedulerApp1.getReservedContainers().size()); - Assert.assertEquals(1, schedulerApp2.getLiveContainers().size()); - - // Trigger scheduling to allocate a container on nm1 for app2. - cs.handle(new NodeUpdateSchedulerEvent(rmNode1)); - Assert.assertNull(cs.getNode(nm1.getNodeId()).getReservedContainer()); - Assert.assertEquals(1, schedulerApp1.getLiveContainers().size()); - Assert.assertEquals(0, schedulerApp1.getReservedContainers().size()); Assert.assertEquals(2, schedulerApp2.getLiveContainers().size()); Assert.assertEquals(7 * GB, cs.getNode(nm1.getNodeId()).getAllocatedResource().getMemorySize()); Assert.assertEquals(12 * GB, cs.getRootQueue().getQueueResourceUsage().getUsed().getMemorySize()); {code} * Can we remove the TestCapacitySchedulerMultiNodesWithPreemption#getFiCaSchedulerApp method and get the scheduler app via calling CapacityScheduler#getApplicationAttempt ? * There are lots of while clauses, Thread#sleep callings and async-thread creation for checking states in TestCapacitySchedulerMultiNodesWithPreemption#testAllocationOfReservationFromOtherNode, could you please calling GenericTestUtils#waitFor, MockRM#waitForState etc. to simplify it? > Reserved Containers not allocated from available space of other nodes in > CandidateNodeSet in MultiNodePlacement (YARN-10259) > ---------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-10293 > URL: https://issues.apache.org/jira/browse/YARN-10293 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.3.0 > Reporter: Prabhu Joseph > Assignee: Prabhu Joseph > Priority: Major > Attachments: YARN-10293-001.patch, YARN-10293-002.patch, > YARN-10293-003-WIP.patch > > > Reserved Containers not allocated from available space of other nodes in > CandidateNodeSet in MultiNodePlacement. YARN-10259 has fixed two issues > related to it > https://issues.apache.org/jira/browse/YARN-10259?focusedCommentId=17105987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17105987 > Have found one more bug in the CapacityScheduler.java code which causes the > same issue with slight difference in the repro. > *Repro:* > *Nodes : Available : Used* > Node1 - 8GB, 8vcores - 8GB. 8cores > Node2 - 8GB, 8vcores - 8GB. 8cores > Node3 - 8GB, 8vcores - 8GB. 8cores > Queues -> A and B both 50% capacity, 100% max capacity > MultiNode enabled + Preemption enabled > 1. JobA submitted to A queue and which used full cluster 24GB and 24 vcores > 2. JobB Submitted to B queue with AM size of 1GB > {code} > 2020-05-21 12:12:27,313 INFO > org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=systest > IP=172.27.160.139 OPERATION=Submit Application Request > TARGET=ClientRMService RESULT=SUCCESS APPID=application_1590046667304_0005 > CALLERCONTEXT=CLI QUEUENAME=dummy > {code} > 3. Preemption happens and used capacity is lesser than 1.0f > {code} > 2020-05-21 12:12:48,222 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptMetrics: > Non-AM container preempted, current > appAttemptId=appattempt_1590046667304_0004_000001, > containerId=container_e09_1590046667304_0004_01_000024, > resource=<memory:1024, vCores:1> > {code} > 4. JobB gets a Reserved Container as part of > CapacityScheduler#allocateOrReserveNewContainer > {code} > 2020-05-21 12:12:48,226 INFO > org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: > container_e09_1590046667304_0005_01_000001 Container Transitioned from NEW to > RESERVED > 2020-05-21 12:12:48,226 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: > tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 > available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with > resource=<memory:1024, vCores:1> > {code} > *Why RegularContainerAllocator reserved the container when the used capacity > is <= 1.0f ?* > {code} > The reason is even though the container is preempted - nodemanager has to > stop the container and heartbeat and update the available and unallocated > resources to ResourceManager. > {code} > 5. Now, no new allocation happens and reserved container stays at reserved. > After reservation the used capacity becomes 1.0f, below will be in a loop and > no new allocate or reserve happens. The reserved container cannot be > allocated as reserved node does not have space. node2 has space for 1GB, > 1vcore but CapacityScheduler#allocateOrReserveNewContainers not getting > called causing the Hang. > *[INFINITE LOOP] CapacityScheduler#allocateContainersOnMultiNodes -> > CapacityScheduler#allocateFromReservedContainer -> Re-reserve the container > on node* > {code} > 2020-05-21 12:13:33,242 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Trying to fulfill reservation for application application_1590046667304_0005 > on node: tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 > 2020-05-21 12:13:33,242 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: > assignContainers: partition= #applications=1 > 2020-05-21 12:13:33,242 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp: > Reserved container=container_e09_1590046667304_0005_01_000001, on node=host: > tajmera-fullnodes-3.tajmera-fullnodes.root.hwx.site:8041 #containers=8 > available=<memory:0, vCores:0> used=<memory:8192, vCores:8> with > resource=<memory:1024, vCores:1> > 2020-05-21 12:13:33,243 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: > Allocation proposal accepted > {code} > CapacityScheduler#allocateOrReserveNewContainers won't be called as below > check in allocateContainersOnMultiNodes fails > {code} > if (getRootQueue().getQueueCapacities().getUsedCapacity( > candidates.getPartition()) >= 1.0f > && preemptionManager.getKillableResource( > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org