[ 
https://issues.apache.org/jira/browse/YARN-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16859740#comment-16859740
 ] 

Juanjuan Tian  edited comment on YARN-9598 at 6/11/19 1:58 AM:
---------------------------------------------------------------

Hi Tao,
{noformat}
disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?{noformat}
for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1, if we 
disable re-reservation, in this case, even scheduler can look up other nodes, 
since the shouldAllocOrReserveNewContainer is false, there is still no other 
reservations, and JobB will still get stuck. 


was (Author: jutia):
Hi Tao,
{noformat}
disable re-reservation can only make the scheduler skip reserving the same 
container repeatedly and try to allocate on other nodes, it won't affect normal 
scheduling for this app and later apps. Thoughts?{noformat}
for example, there are 10 nodes(h1,h2,...h9,h10), each has 8G memory in 
cluster, and two queues A,B, each is configured with 50% capacity.

firstly there are 10 jobs (each requests 6G respurce) is submited to queue A, 
and each node of the 10 nodes will have a contianer allocated.

Afterwards,  another job JobB which requests 3G resource is submited to queue 
B, and there will be one container with 3G size reserved on node h1, if we 
disable re-reservation, in this case, even scheduler can look up other nodes, 
since the shouldAllocOrReserveNewContainer is false, there is still on other 
reservations, and JobB will still get stuck. 

> Make reservation work well when multi-node enabled
> --------------------------------------------------
>
>                 Key: YARN-9598
>                 URL: https://issues.apache.org/jira/browse/YARN-9598
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Tao Yang
>            Assignee: Tao Yang
>            Priority: Major
>         Attachments: YARN-9598.001.patch, image-2019-06-10-11-37-43-283.png, 
> image-2019-06-10-11-37-44-975.png
>
>
> This issue is to solve problems about reservation when multi-node enabled:
>  # As discussed in YARN-9576, re-reservation proposal may be always generated 
> on the same node and break the scheduling for this app and later apps. I 
> think re-reservation in unnecessary and we can replace it with 
> LOCALITY_SKIPPED to let scheduler have a chance to look up follow candidates 
> for this app when multi-node enabled.
>  # Scheduler iterates all nodes and try to allocate for reserved container in 
> LeafQueue#allocateFromReservedContainer. Here there are two problems:
>  ** The node of reserved container should be taken as candidates instead of 
> all nodes when calling FiCaSchedulerApp#assignContainers, otherwise later 
> scheduler may generate a reservation-fulfilled proposal on another node, 
> which will always be rejected in FiCaScheduler#commonCheckContainerAllocation.
>  ** Assignment returned by FiCaSchedulerApp#assignContainers could never be 
> null even if it's just skipped, it will break the normal scheduling process 
> for this leaf queue because of the if clause in LeafQueue#assignContainers: 
> "if (null != assignment) \{ return assignment;}"
>  # Nodes which have been reserved should be skipped when iterating candidates 
> in RegularContainerAllocator#allocate, otherwise scheduler may generate 
> allocation or reservation proposal on these node which will always be 
> rejected in FiCaScheduler#commonCheckContainerAllocation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to