[ 
https://issues.apache.org/jira/browse/YARN-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450422#comment-16450422
 ] 

Zian Chen commented on YARN-8178:
---------------------------------

eHi Weiwei, thank you so much for your comments. Let me answer those questions 
you mentioned above. 

1. Highlevel

Can user decide if their app can be demoted or not?

{color:#f79232}The user cannot directly decide their app can be demoted or not. 
This will be judged by RM in our design. At first, all the container will be 
launched as G containers, then RM select container candidates can be preempted 
just as we did in current preemption logic, we have four selectors extend from 
PreemptionCandidatesSelector which will be used to select candidates, nothing 
is changed here.{color}

{color:#f79232}If the user doesn't want their app to be demoted or preempted, 
they can put their app into the non-preempted queue, or long preemption 
monitoring timeout queue to get longer to-be-demoted intervals. {color}

{color:#f79232}In summary, our change focus on two aspects,{color}

{color:#f79232}1. {color}{color:#f79232}use different preemption frequency 
queues to give user different choices for their app rather than let the user 
decide if their app can be demoted or not.{color}

{color:#f79232}2. current preemption logic is every time preemption monitoring 
interval arrives, RM will go through preemption check and select container 
candidates based on ideal allocation, then kill them after 
max-wait-kill-timeout. We just add an extra operation between select candidates 
and kill the container, which is demote G container into O container.{color}

2. How to select which apps/containers to demote?

Could you please elaborate the 2nd paragraph in {{Make more aggressive 
preemption interval}}, if possible, with an example? I am curious how it works 
together with current preemption logic. When it needs to demote containers, how 
to select apps and containers? A policy is pretty important, because you don't 
want to end up with some inefficient demotions, such as two many apps gets 
affected, apps get starved etc.

{color:#f79232}As explained in question 1, the selection of container 
candidates remains the same with current preemption logic.{color}

3. O container lifecycle

How to manage the life cycle of O containers, as you mentioned in the doc, it 
will continue to run unless RM kills it or gets preempted. Bad case scenario, 
what if a O container continues to consume nodes resource, will NM be able to 
kill it when it exceeds some threshold?

{color:#f79232}O container will go through such a process in their 
lifecycle.{color}

{color:#f79232}G container (all container allocated as G container) -> get 
demote when resources are insufficient -> killed by RM in configurable 
timeout(see ){color}

 

> [Umbrella] Resource Over-commitment Based on Opportunistic Container 
> Preemption
> -------------------------------------------------------------------------------
>
>                 Key: YARN-8178
>                 URL: https://issues.apache.org/jira/browse/YARN-8178
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: capacity scheduler
>            Reporter: Zian Chen
>            Priority: Major
>         Attachments: Resource Over-commitment Based on Opportunistic 
> Container Preemption.pdf
>
>
> We want to provide an opportunistic container-based solution to achieve more 
> aggressive preemption with shorter preemption monitoring interval. 
> Instead of allowing applications to allocate resources with a mix of 
> guaranteed and opportunistic containers, we allow newly submitted 
> applications to only contain guaranteed containers. Meanwhile, we change the 
> preemption logic to, instead of killing containers, demote guaranteed 
> containers into opportunistic ones, so that when there are new applications 
> submitted, we can ensure that these containers can be launched by preempting 
> opportunistic containers.
> This approach is related to YARN-1011 but achieves over-commitment in a 
> different way. However, we rely on opportunistic container part implemented 
> in YARN-1011 to make our design work well. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to