[ 
https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858564#comment-15858564
 ] 

Hitesh Sharma commented on YARN-5501:
-------------------------------------

Hi [~jlowe],

First off all a big thanks for taking the time to look at the document and 
sharing your thoughts. I appreciate it a lot.

bq. I am confused on how this will be used in practice.  To me pre-initialized 
containers means containers that have already started up with application- or 
framework-specific resources localized, processes have been launched using 
those resources, and potentially connections already negotiated to external 
services.  I'm not sure how YARN is supposed to know what mix of local 
resources, users, and configs to use for preinitialized containers that will 
get a good "hit rate" on container requests.  Maybe I'm misunderstanding what 
is really meant by "preinitialized," and some concrete, sample use cases with 
detailed walkthroughs of how they work in practice would really help 
crystallize the goals here.

Your understanding of pre-initialized containers is correct here. In the 
proposed design YARN RM has the config to start pre-initialized containers and 
this config is pretty much a launch context, which contains launch commands, 
details of resources to localize, and we also provide the resource constraints 
with which the container should be started. This configuration is currently 
static, but in the future we intend to this to be pluggable, so we can extend 
it to be dynamic and adjust based on cluster load.

The first use case happens to be a scenario where each of the container needs 
to start some processes that take a lot of time to initialize (localization and 
startup costs). YARN NM receives the config to start the pre-initialized 
container (there is a dummy application that is associated with the pre-init 
container for a specific application) and it follows the regular code paths for 
a container which includes localizing resources and launching the container. As 
you know, in YARN a container goes to RUNNING state once started, but a 
pre-initialized container instead goes to PREINITIALIZED state (there are some 
hooks which allow us to know that the container has initialized properly). From 
this point the container is not different from a regular container as the 
container monitor is overlooking it. The "Pool Manager" within YARN NM is used 
to start the pre-initialized container and watches for container events like 
stop in which case it simply tries to start it again. In other words at the 
moment we simply use YARN RM to pick the nodes where pre-initialized container 
should be started and let the "Pool Manager" in the NM manage the lifecycle of 
the container.

When the AM for which we pre-initialized the container comes and asks for this 
container then the "Container Manager" takes the pre-initialized container by 
issuing a "detach" container event and "attaches" it to the application. We 
added attachContainer and detachContainer events into ContainerExecutor which 
allow us to define what they mean. As an example, in attachContainer we start a 
new process within the cgroup of pre-initialized container. The PID to 
container mapping within the ContainerExecutor is updated to reflect everything 
accordingly (pre-initialized containers have a different container ID and 
belong to a different application before they are taken up). As part of the 
detachContainer all the resources associated with the pre-initialized container 
are now associated with the new container and get cleaned up accordingly.

The other use case where we have prototyped container pooling is the scenario 
where a container actually needs to be a Virtual Machine. Creation of VMs can 
take a long time thus container pooling allows us to keep the empty VM shells 
ready to go.

bq. Reusing containers across different applications is going to create some 
interesting scenarios that don't exist today.  For example, what does a 
container ID for one of these looks like?  How many things today assume that 
all container IDs for an application are essentially prefixed by the 
application ID?  This would violate that assumption, unless we introduce some 
sort of container ID aliasing where we create a "fake" container ID that maps 
to the "real" ID of the reused container.  It would be good to know how we're 
going to treat container IDs and what applications will see when they get one 
of these containers in response to their allocation request.

All pre-initialized containers belong to a specific application type. There is 
a dummy application created to which the pre-initialized container are mapped. 
As part of containerAttach and containerDetach event we disassociate the 
containers between application. Specifically ContainerExecutor has a mapping of 
container ID to PID file and as part of container detach we update this 
mapping. For e.g. let's say the pre-init container had the ID container123 in 
that case the mapping in executor would have 
container123=../container123.pidfile, but as part of container attach and 
detach we update this mapping so that it now looks like 
newcontainer456=../container13.pidfile. The ContainerExecutors use this mapping 
to locate the cgroup or Windows job object and thus all container events are 
now issued on the pre-init container (i.e. the already started process).

bq. What happens for preinitialized container failures, both during application 
execution and when idle?  Do we let the application launch its own recovery 
container, etc.

Eventually we want YARN RM should manage these things but for now in our PoC we 
have "Pool Manager", which listens on these events, and simply keeps retrying. 
At the moment we only use YARN RM to select nodes and pass config down.

bq. How does resource accounting/scheduling work with these containers?  Are 
they running in a dedicated queue?  Can users go beyond their normal limits by 
getting these containers outside of the app's queue?  Will it look weird when 
the user's queue isn't full yet we don't allow them any more containers because 
they're already using the maximum number of preinitialized containers?

We need to figure this part out. One of the thoughts we have had is that 
pre-init containers can be considered opportunistic which means they can get 
killed in favor of other containers, but if they do get used then they take the 
mantle of the new container.

bq. Could you elaborate on how the resizing works?  How do the processes 
running within the container being resized made aware of the new size 
constraints?  Today containers don't communicate with the NM directly, so I'm 
not sure how the preinitialized containers are supposed to know they are 
suddenly half the size or can now leverage more memory than they could before.  
Without that communication channel it seems like we're either going to kill 
processes that overflowed their lowered memory constraint or we're going to 
waste cluster resources because the processes are still trying to fit within 
the old memory size.

Strictly speaking we haven't prototyped this part, but the idea is to reuse 
container allocation increase mechanisms. For e.g. if the pre-init container 
was running with 2 core and 2GB then after "attach" it could have it's 
resources increased to 4 core and 4GB. The resizing is simply at the job object 
or cgroup level and we expect the application to have it's own communication 
channel to talk with the processes that are started a priori.

bq. What are the security considerations?  Are preinitialized containers tied 
to a particular user?  How are app-specific credentials conveyed to the 
preinitialized container, and can credentials leak between apps that use the 
same container?

We haven't prototyped this part as much. Currently we start the pre-init 
containers by skipping some of the security checks done in the "Container 
Manager". I think we can instead configure the user with which pre-init 
containers should be started and then associate them with the actual 
application.

bq. Is this some separate, new protocol for advertising or is this just simply 
reporting the container is launched just like other container status today?  
The RM already knows it sent the NM a command to launch the container, so it 
seems this is just the NM reporting the state of the container is now launched 
as it does for any other container start request today, but I wasn't sure if 
that is what was meant here.

This is kind of reporting back to the RM that the pre-init container is ready. 
We have been thinking of using [YARN-3926] to advertise the pre-init container 
as resources so they can be requested by the AMs.

bq. I'm confused, I thought the preinitialized container is already launched, 
but this talks about launching it after attach.  Again a concrete use-case 
walkthrough would help clarify what's really being proposed.  If this is 
primarily about reducing localization time instead of process startup after 
localization then there are simpler approaches we can take.

You are correct that the pre-init containers are already started. But in our 
scenarios a container can have multiple processes running within the same 
cgroup or job object. With container pooling we start some of these processes a 
priori and the remaining are started when the AM comes around asking for 
containers. The containers for our application look for the processes started a 
priori and communicate with them.

Again thanks a ton for the excellent feedback and look forward to discuss more. 
We can refine some of the ideas here and make them more generic and useful to 
the community.


> Container Pooling in YARN
> -------------------------
>
>                 Key: YARN-5501
>                 URL: https://issues.apache.org/jira/browse/YARN-5501
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Arun Suresh
>            Assignee: Hitesh Sharma
>         Attachments: Container Pooling - one pager.pdf
>
>
> This JIRA proposes a method for reducing the container launch latency in 
> YARN. It introduces a notion of pooling *Unattached Pre-Initialized 
> Containers*.
> Proposal in brief:
> * Have a *Pre-Initialized Container Factory* service within the NM to create 
> these unattached containers.
> * The NM would then advertise these containers as special resource types 
> (this should be possible via YARN-3926).
> * When a start container request is received by the node manager for 
> launching a container requesting this specific type of resource, it will take 
> one of these unattached pre-initialized containers from the pool, and use it 
> to service the container request.
> * Once the request is complete, the pre-initialized container would be 
> released and ready to serve another request.
> This capability would help reduce container launch latencies and thereby 
> allow for development of more interactive applications on YARN.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to