[jira] [Comment Edited] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-12 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105987#comment-17105987
 ] 

Prabhu Joseph edited comment on YARN-10259 at 5/13/20, 5:31 AM:


*ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a 
node is Full and has a Reserved Container.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers 
-> LeafQueue#allocateFromReservedContainer

When CS tries to allocate or reserve a new container on a node, LeafQueue does 
allocate from already Reserved Container iterating all the nodes part of multi 
node candidatesSet. When a node is full with reserved container, it sends 
ReReserved Allocation. This runs in a loop without moving to next nodes for 
allocating / reserving new containers.

*Example:*

NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB)

A. CS tries to allocate or reserve new container on NodeB -> B. LeafQueue does 
allocate of reserved containers iterating all nodes part of multi node 
candidatesSet -> C. Checks NodeA reserved container can not be ALLOCATED -> D. 
RE-RESERVE and Return RESERVED assignment -> A to D runs in a loop without 
trying to allocate on NodeB

*SOLUTION:*
LeafQueue#allocateFromReservedContainer should not happen for a Multi Node 
Placement when the call is from 
CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for 
CapacityScheduler#allocateFromReservedContainer. 
CapacityScheduler#allocateFromReservedContainer will send a Single Node 
Candidate for both Single Node / Multi Node placement.


*ISSUE 2: No new Allocation happens in Multi Node Placement when the first node 
part of multi node iterator is Full.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers 
-> FiCaSchedulerApp#assignContainers -> 
RegularContainerAllocator#assignContainers -> RegularContainerAllocator#allocate

When CS tries to allocate or reserve new container on a node, 
RegularContainerAllocator#allocate iterates all nodes given by 
MultiNodeLookupPolicy. If the first node does not have space to fit the 
SchedulerRequestKey, it will send a CSAssignment With RESERVED Allocation. It 
skips checking Subsequent nodes with space. (This won't be a problem for 
ResourceUsageMultiNodeLookupPolicy as that always has first node with less 
usage but affects custom policies like BinPacking).

*Example:*

NodeA (2GB available space), NodeB (3GB available space)

MultiNodeIterator order => NodeA, NodeB

CS tries to allocate/reserve on NodeA (3GB pending request) -> 
RegularContainerAllocator gets first node part of iterator (Node A) -> Sends 
Allocation RESERVED

CS tries to allocate/reserve on NodeB (3GB pending request) -> 
RegularContainerAllocator gets first node part of iterator (Node A) -> Sends 
ReReserved

There is no new allocation / reservation happens on subsequent nodes of Multi 
Node Iteartor.

*SOLUTION:*

RegularContainerAllocator#allocate has to try to allocate on subsequent nodes 
as well before sending RESREVED / RERESERVED.


was (Author: prabhu joseph):
*ISSUE 1: No new Allocation/Reservation happens in Multi Node Placement when a 
node is Full and has a Reserved Container.*

Below is the flow where issue happens:

CapacityScheduler#allocateContainersOnMultiNodes -> 
CapacityScheduler#allocateOrReserveNewContainers -> LeafQueue#assignContainers 
-> LeafQueue#allocateFromReservedContainer

When CS tries to allocate or reserve a new container on a node, LeafQueue does 
allocate from already Reserved Container iterating all the nodes part of multi 
node candidatesSet. When a node is full with reserved container, it sends 
ReReserved Allocation. This runs in a loop without moving to next nodes for 
allocating / reserving new containers.

Example:

NodeA (fully utilized with reserved container of 5GB), NodeB (has space for 5GB)

A. CS tries to allocate or reserve new container on NodeB -> B. LeafQueue does 
allocate of reserved containers iterating all nodes part of multi node 
candidatesSet -> C. Checks NodeA reserved container can not be ALLOCATED -> D. 
RE-RESERVE and Return RESERVED assignment -> A to D runs in a loop without 
trying to allocate on NodeB

SOLUTION:
LeafQueue#allocateFromReservedContainer should not happen for a Multi Node 
Placement when the call is from 
CapacityScheduler#allocateOrReserveNewContainers. It has to happen only for 
CapacityScheduler#allocateFromReservedContainer. 
CapacityScheduler#allocateFromReservedContainer will send a Single Node 
Candidate for both Single Node / Multi Node placement.


*ISSUE 2: No new Allocation happens in Multi Node Placement when the first node 
part of multi node i

[jira] [Comment Edited] (YARN-10259) Reserved Containers not allocated from available space of other nodes in CandidateNodeSet in MultiNodePlacement

2020-05-06 Thread Bibin Chundatt (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101380#comment-17101380
 ] 

Bibin Chundatt edited comment on YARN-10259 at 5/7/20, 6:00 AM:


In addition to the above.. 

I think the issue exists in the *LeafQueue#allocateFromReservedContainer* .. We 
do try the container allocation from first node we get iterating through all 
the candidate set.
Change to previous logic. 

 Issue -. container gets unreserved on node1. then again we reserve on node 1 
during allocation .. The nodes in the last in list with reserved containers  
might never get a chance to do allocation./ unreservation.

This impacts performance of multiNodelookup too. *AsyncSchedulerThread* give a 
fair chance to all nodes to do unreserve/allocate for reserved container.
Attempt allocation if reserved container exists with a single candidate nodeset.



was (Author: bibinchundatt):
In addition to the above.. 

I think the issue exists in the *LeafQueue#allocateFromReservedContainer* .. We 
do try the container allocation from first node we get iterating through all 
the candidate set.
Change to previous logic. 

 Issue -. container gets unreserved on node1. then again we reserve on node 1 
during allocation .. The nodes in the last in list with reserved containers  
might never get a chance to do allocation./ unreservation.

This impacts performance of multiNodelookup too. AsyncSchedulerThread give a 
fair change to each node to do unreserve/allocate from reserved container.
Attempt allocation if reserved container exists with a single candidate nodeset.


> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---
>
> Key: YARN-10259
> URL: https://issues.apache.org/jira/browse/YARN-10259
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
> Attachments: REPRO_TEST.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_01 reserved container 
> container_1588684773609_0003_01_01 on node host: h1:1234 #containers=1 
> available= used=. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> 
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_01, on node=host: h1:1234 
> #containers=1 available= used= 
> with resource=
>RESERVED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h1:1234; Resource=)]
>
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_01, but node is 
> not reserved
>ALLOCATED=[(Application=appattempt_1588684773609_0003_01; 
> Node=h2:1234; Resource=)]
> {code}
> After reverting fix of YARN-8127, it works. Attached testcase which 
> reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org