[jira] [Comment Edited] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement
[ https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105987#comment-17105987 ] Prabhu Joseph edited comment on YARN-10259 at 5/13/20, 5:31 AM: *ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a node is Full and has a Reserved Container.* Below is the flow where issue happens: CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers -> LeafQueue#allocateFromReservedContainer When CS tries to allocate or reserve a new container on a node, LeafQueue does allocate from already Reserved Container iterating all the nodes part of multi node candidatesSet. When a node is full with reserved container, it sends ReReserved Allocation. This runs in a loop without moving to next nodes for allocating / reserving new containers. *Example:* NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB) A. CS tries to allocate or reserve new container on NodeB -> B. LeafQueue does allocate of reserved containers iterating all nodes part of multi node candidatesSet -> C. Checks NodeA reserved container can not be ALLOCATED -> D. RE-RESERVE and Return RESERVED assignment -> A to D runs in a loop without trying to allocate on NodeB *SOLUTION:* LeafQueue#allocateFromReservedContainer should not happen for a Multi Node Placement when the call is from CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for CapacityScheduler#allocateFromReservedContainer. CapacityScheduler#allocateFromReservedContainer will send a Single Node Candidate for both Single Node / Multi Node placement. *ISSUE 2: No new Allocation happens in Multi Node Placement when the first node part of multi node iterator is Full.* Below is the flow where issue happens: CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers -> FiCaSchedulerApp#assignContainers -> RegularContainerAllocator#assignContainers -> RegularContainerAllocator#allocate When CS tries to allocate or reserve new container on a node, RegularContainerAllocator#allocate iterates all nodes given by MultiNodeLookupPolicy. If the first node does not have space to fit the SchedulerRequestKey, it will send a CSAssignment With RESERVED Allocation. It skips checking Subsequent nodes with space. (This won't be a problem for ResourceUsageMultiNodeLookupPolicy as that always has first node with less usage but affects custom policies like BinPacking). *Example:* NodeA (2GB available space), NodeB (3GB available space) MultiNodeIterator order => NodeA, NodeB CS tries to allocate/reserve on NodeA (3GB pending request) -> RegularContainerAllocator gets first node part of iterator (Node A) -> Sends Allocation RESERVED CS tries to allocate/reserve on NodeB (3GB pending request) -> RegularContainerAllocator gets first node part of iterator (Node A) -> Sends ReReserved There is no new allocation / reservation happens on subsequent nodes of Multi Node Iteartor. *SOLUTION:* RegularContainerAllocator#allocate has to try to allocate on subsequent nodes as well before sending RESREVED / RERESERVED. was (Author: prabhu joseph): *ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a node is Full and has a Reserved Container.* Below is the flow where issue happens: CapacityScheduler#allocateContainersOnMultiNodes -> CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers -> LeafQueue#allocateFromReservedContainer When CS tries to allocate or reserve a new container on a node, LeafQueue does allocate from already Reserved Container iterating all the nodes part of multi node candidatesSet. When a node is full with reserved container, it sends ReReserved Allocation. This runs in a loop without moving to next nodes for allocating / reserving new containers. Example: NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB) A. CS tries to allocate or reserve new container on NodeB -> B. LeafQueue does allocate of reserved containers iterating all nodes part of multi node candidatesSet -> C. Checks NodeA reserved container can not be ALLOCATED -> D. RE-RESERVE and Return RESERVED assignment -> A to D runs in a loop without trying to allocate on NodeB SOLUTION: LeafQueue#allocateFromReservedContainer should not happen for a Multi Node Placement when the call is from CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for CapacityScheduler#allocateFromReservedContainer. CapacityScheduler#allocateFromReservedContainer will send a Single Node Candidate for both Single Node / Multi Node placement. *ISSUE 2: No new Allocation happens in Multi Node Placement when the first node part of multi node i
[jira] [Comment Edited] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement
[ https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101380#comment-17101380 ] Bibin Chundatt edited comment on YARN-10259 at 5/7/20, 6:00 AM: In addition to the above.. I think the issue exists in the *LeafQueue#allocateFromReservedContainer* .. We do try the container allocation from first node we get iterating through all the candidate set. Change to previous logic. Issue -. container gets unreserved on node1. then again we reserve on node 1 during allocation .. The nodes in the last in list with reserved containers might never get a chance to do allocation./ unreservation. This impacts performance of multiNodelookup too. *AsyncSchedulerThread* give a fair chance to all nodes to do unreserve/allocate for reserved container. Attempt allocation if reserved container exists with a single candidate nodeset. was (Author: bibinchundatt): In addition to the above.. I think the issue exists in the *LeafQueue#allocateFromReservedContainer* .. We do try the container allocation from first node we get iterating through all the candidate set. Change to previous logic. Issue -. container gets unreserved on node1. then again we reserve on node 1 during allocation .. The nodes in the last in list with reserved containers might never get a chance to do allocation./ unreservation. This impacts performance of multiNodelookup too. AsyncSchedulerThread give a fair change to each node to do unreserve/allocate from reserved container. Attempt allocation if reserved container exists with a single candidate nodeset. > Reserved Containers not allocated from available space of other nodes in > CandidateNodeSet in MultiNodePlacement > --- > > Key: YARN-10259 > URL: https://issues.apache.org/jira/browse/YARN-10259 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.2.0, 3.3.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > Attachments: REPRO_TEST.patch > > > Reserved Containers are not allocated from the available space of other nodes > in CandidateNodeSet in MultiNodePlacement. > *Repro:* > 1. MultiNode Placement Enabled. > 2. Two nodes h1 and h2 with 8GB > 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets > placed in h2. > 4. Submit app3 AM which is reserved in h1 > 5. Kill app2 which frees space in h2. > 6. app3 AM never gets ALLOCATED > RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on > h2 as it expects the assignment to be on same node where reservation has > happened. > {code} > 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] > scheduler.SchedulerApplicationAttempt > (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt > appattempt_1588684773609_0003_01 reserved container > container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 > available= used=. This attempt > currently has 1 reserved containers at priority 0; currentReservation > > 2020-05-05 18:49:37,264 INFO [AsyncDispatcher event handler] > fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved > container=container_1588684773609_0003_01_01, on node=host: h1:1234 > #containers=1 available= used= > with resource= >RESERVED=[(Application=appattempt_1588684773609_0003_01; > Node=h1:1234; Resource=)] > > 2020-05-05 18:49:38,283 DEBUG [Time-limited test] > allocator.RegularContainerAllocator > (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: > node=h2 application=application_1588684773609_0003 priority=0 > pendingAsk=,repeat=1> > type=OFF_SWITCH > 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp > (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate > from reserved container container_1588684773609_0003_01_01, but node is > not reserved >ALLOCATED=[(Application=appattempt_1588684773609_0003_01; > Node=h2:1234; Resource=)] > {code} > After reverting fix of YARN-8127, it works. Attached testcase which > reproduces the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org