[ 
https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15681223#comment-15681223
 ] 

Arun Suresh commented on YARN-5292:
-----------------------------------

Thanks for the patch [~hrsharma]..

Did a fly-by of the patch and the design doc. Some design comments:
# The original intent of the JIRA, I guess is to provide an alternative to 
killing opportunistic containers to make room for guaranteed containers. This 
implies that we would need to wire this through the ContainerScheduler, which 
is now the entity that decides when to and which opp containers to kill.
# I was thinking we could also expose an API on the 
ContainerManagementProtocol, to allow AMs to directly pause a container, but I 
am guessing this should be allowed only for Guaranteed containers. Since if we 
expose a pause API, we should expose a resume API, but it is not necessary that 
opportunistic containers are resume-able at the time the AM needs them to be. 
[~jianhe], [~vvasudev], would be nice to hear your thoughts on this. Since if I 
understand correctly, for yarn native services, there is a need to just stop a 
container (without losing the allocation) for a period of time. Don't know if 
that can be modeled as a container PAUSE via some support from the underlying 
ContainerExecutor/Runtime.
# We need some way to expose what resource are reclaimable by the NM when a 
container is paused. It is possible that on deployments using some 
implementations of the ContainerExecutor/Runtime that not all resources of a 
paused container will be reclaim-able by the NM to start other 
opportunistic/guaranteed containers. For eg, it maybe that on some systems, 
vcores are throttled to 0 for the container, while on others, the memory / 
state is also dumped into a secondary store, which means the memory also might 
be re-claimable. We would some way to plug this information into the 
ResourUtilizationTracker and the ContainerScheduler.

I am thinking we should maybe convert this to an Umbrella JIRA and have work 
items as sub-jiras created against it and work against a branch. 

With regard to the patch itself, I understand the current one is meant to 
handle the changes needed in the state machines etc. Do take a look at 
{{TestContainer}} class, and see if it is possible to add some tests to verify 
that container life-cycle events are handled correctly. Will take a deeper look 
at the patch after that.

> Support for PAUSED container state
> ----------------------------------
>
>                 Key: YARN-5292
>                 URL: https://issues.apache.org/jira/browse/YARN-5292
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Hitesh Sharma
>            Assignee: Hitesh Sharma
>         Attachments: YARN-5292.001.patch, YARN-5292.002.patch, 
> YARN-5292.003.patch, yarn-5292.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> When a running container gets preempted, it enters the PAUSED state, where it 
> remains until resources get freed up on the node then the preempted container 
> can resume to the running state.
>  
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt would pause 
> the VM and resume would restore it back to the running state.
> If the container doesn't support preemption, then preempt would default to 
> killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to