[ 
https://issues.apache.org/jira/browse/YARN-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100526#comment-17100526
 ] 

Prabhu Joseph commented on YARN-10259:
--------------------------------------

RegularContainerAllocator does unreserve on a node in order to replace reserved 
application and place 
it on another node with available space for certain cases but does not happen 
for the above scenario
causing rejection of proposal by YARN-8127.

Need additional fix to unreserve for the above scenario.


cc [~sunil.gov...@gmail.com]

> Reserved Containers not allocated from available space of other nodes in 
> CandidateNodeSet in MultiNodePlacement
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10259
>                 URL: https://issues.apache.org/jira/browse/YARN-10259
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.2.0, 3.3.0
>            Reporter: Prabhu Joseph
>            Priority: Major
>         Attachments: REPRO_TEST.patch
>
>
> Reserved Containers are not allocated from the available space of other nodes 
> in CandidateNodeSet in MultiNodePlacement. 
> *Repro:*
> 1. MultiNode Placement Enabled.
> 2. Two nodes h1 and h2 with 8GB
> 3. Submit app1 AM (5GB) which gets placed in h1 and app2 AM (5GB) which gets 
> placed in h2.
> 4. Submit app3 AM which is reserved in h1
> 5. Kill app2 which frees space in h2.
> 6. app3 AM never gets ALLOCATED
> RM logs shows YARN-8127 fix rejecting the allocation proposal for app3 AM on 
> h2 as it expects the assignment to be on same node where reservation has 
> happened.
> {code}
> 2020-05-05 18:49:37,264 DEBUG [AsyncDispatcher event handler] 
> scheduler.SchedulerApplicationAttempt 
> (SchedulerApplicationAttempt.java:commonReserve(573)) - Application attempt 
> appattempt_1588684773609_0003_000001 reserved container 
> container_1588684773609_0003_01_000001 on node host: h1:1234 #containers=1 
> available=<memory:3072, vCores:7> used=<memory:5120, vCores:1>. This attempt 
> currently has 1 reserved containers at priority 0; currentReservation 
> <memory:5120, vCores:1>
> 2020-05-05 18:49:37,264 INFO  [AsyncDispatcher event handler] 
> fica.FiCaSchedulerApp (FiCaSchedulerApp.java:apply(670)) - Reserved 
> container=container_1588684773609_0003_01_000001, on node=host: h1:1234 
> #containers=1 available=<memory:3072, vCores:7> used=<memory:5120, vCores:1> 
> with resource=<memory:5120, vCores:1>
>        RESERVED=[(Application=appattempt_1588684773609_0003_000001; 
> Node=h1:1234; Resource=<memory:5120, vCores:1>)]
>        
> 2020-05-05 18:49:38,283 DEBUG [Time-limited test] 
> allocator.RegularContainerAllocator 
> (RegularContainerAllocator.java:assignContainer(514)) - assignContainers: 
> node=h2 application=application_1588684773609_0003 priority=0 
> pendingAsk=<per-allocation-resource=<memory:5120, vCores:1>,repeat=1> 
> type=OFF_SWITCH
> 2020-05-05 18:49:38,285 DEBUG [Time-limited test] fica.FiCaSchedulerApp 
> (FiCaSchedulerApp.java:commonCheckContainerAllocation(371)) - Try to allocate 
> from reserved container container_1588684773609_0003_01_000001, but node is 
> not reserved
>        ALLOCATED=[(Application=appattempt_1588684773609_0003_000001; 
> Node=h2:1234; Resource=<memory:5120, vCores:1>)]
> {code}
> After reverting fix of YARN-8127, it works. Attached testcase which 
> reproduces the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to