[ https://issues.apache.org/jira/browse/YARN-8178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450422#comment-16450422 ]
Zian Chen commented on YARN-8178: --------------------------------- eHi Weiwei, thank you so much for your comments. Let me answer those questions you mentioned above. 1. Highlevel Can user decide if their app can be demoted or not? {color:#f79232}The user cannot directly decide their app can be demoted or not. This will be judged by RM in our design. At first, all the container will be launched as G containers, then RM select container candidates can be preempted just as we did in current preemption logic, we have four selectors extend from PreemptionCandidatesSelector which will be used to select candidates, nothing is changed here.{color} {color:#f79232}If the user doesn't want their app to be demoted or preempted, they can put their app into the non-preempted queue, or long preemption monitoring timeout queue to get longer to-be-demoted intervals. {color} {color:#f79232}In summary, our change focus on two aspects,{color} {color:#f79232}1. {color}{color:#f79232}use different preemption frequency queues to give user different choices for their app rather than let the user decide if their app can be demoted or not.{color} {color:#f79232}2. current preemption logic is every time preemption monitoring interval arrives, RM will go through preemption check and select container candidates based on ideal allocation, then kill them after max-wait-kill-timeout. We just add an extra operation between select candidates and kill the container, which is demote G container into O container.{color} 2. How to select which apps/containers to demote? Could you please elaborate the 2nd paragraph in {{Make more aggressive preemption interval}}, if possible, with an example? I am curious how it works together with current preemption logic. When it needs to demote containers, how to select apps and containers? A policy is pretty important, because you don't want to end up with some inefficient demotions, such as two many apps gets affected, apps get starved etc. {color:#f79232}As explained in question 1, the selection of container candidates remains the same with current preemption logic.{color} 3. O container lifecycle How to manage the life cycle of O containers, as you mentioned in the doc, it will continue to run unless RM kills it or gets preempted. Bad case scenario, what if a O container continues to consume nodes resource, will NM be able to kill it when it exceeds some threshold? {color:#f79232}O container will go through such a process in their lifecycle.{color} {color:#f79232}G container (all container allocated as G container) -> get demote when resources are insufficient -> killed by RM in configurable timeout(see ){color} > [Umbrella] Resource Over-commitment Based on Opportunistic Container > Preemption > ------------------------------------------------------------------------------- > > Key: YARN-8178 > URL: https://issues.apache.org/jira/browse/YARN-8178 > Project: Hadoop YARN > Issue Type: New Feature > Components: capacity scheduler > Reporter: Zian Chen > Priority: Major > Attachments: Resource Over-commitment Based on Opportunistic > Container Preemption.pdf > > > We want to provide an opportunistic container-based solution to achieve more > aggressive preemption with shorter preemption monitoring interval. > Instead of allowing applications to allocate resources with a mix of > guaranteed and opportunistic containers, we allow newly submitted > applications to only contain guaranteed containers. Meanwhile, we change the > preemption logic to, instead of killing containers, demote guaranteed > containers into opportunistic ones, so that when there are new applications > submitted, we can ensure that these containers can be launched by preempting > opportunistic containers. > This approach is related to YARN-1011 but achieves over-commitment in a > different way. However, we rely on opportunistic container part implemented > in YARN-1011 to make our design work well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org