[ https://issues.apache.org/jira/browse/YARN-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15873247#comment-15873247 ]
Arun Suresh commented on YARN-5501: ----------------------------------- Great discussion. I would suggest we make the following simplify assumptions for an initial cut. h4. 1. Concept of detach and attach. >From the doc, it looks like "detach" implies removing the pre-initialized >container from the pool and "attach" referrs to associating an app with a >pooled container. It might be simpler if we treat the operation as atomic. In >that sense, we can make do with just having an "attach" or "lease", where a >pre-initialized container is associated with an app. h4. 2. Use once and throw away. For the sake of simplicity. Maybe we should assume that once an application is assigned a container from the pool and it has "attached" to it, it is the application's container and the Pooling framework relinquishes ownership of. The container then completes normally and all resource accounting is billed against the app. The pool of containers can be re-populated externally by the pool manager component in the RM (beyond the scope of this currently) h4. 3. Resource accounting. This is one of the reasons why I feel generalized resources would be useful here. Assume initialy we have a cluster with resources <10 vcores, 10 GB> spread across 2 NMs equally. Lets say we allocate 4 pre-initialized containers (via the pooling component in the RM) of type *foo* each with <1 vcore, 1 GB>. Lets say's we distribute it equally across the NMs. Once the pre-initialized containers have started, the total cluster resources would be <6 vcores, 6 GB, 4 foo>. Each NM would have <3 vcores, 3 GB, 2 foo> available resources. Now if an app asks for <0 vcores, 0 GB, 1 foo>, it will be allocated 1 pooled container and the resources associated with 1 foo <1 vcore, 1 GB> can be accounted against the app. The app can also maybe ask for <1 vcore, 1 GB, 1 foo>, in which case, the app will still be assigned one of the pooled containers with the assumption that, the container's size can expand by <1 vcore, 1 GB> if required. Cgroups/JobObjects to be used to enforce this. h4. 4. AM Container communication. As raised by [~jlowe], It is currently unclear what happens if the app framework requires an umbilical connection back to the AM, how does the pre-initialized container know where that AM is and when to connect. Currently, the *ContainerLaunchContext* should contain all context required by the container to operate, this includes the location of the AM and how to talk to it. This is usually application specific (The *TaskUmbillical* protocol used by MR for eg.) If the container is pre-initialized, this implies that the container is in some stand-by state waiting for this context to be passed to it. We can should call this out the design doc: # The "attach" process will pass the application's ContainerLaunchContext to the pre-initialized container. # The feature requires some smarts in the ContainerExecutor, that knows how to pass the LaunchContext specific to the "type" of pre-initialized container to the container, which itself should somehow konw that it is pre-iniitialized and in some stand-by state. We can leverage some of the *Container Runtime* features for this. # TODO later: Introduce an NM/Executor <-> container protocol to formalized the above, which maybe useful for long running containers. Thoughts? > Container Pooling in YARN > ------------------------- > > Key: YARN-5501 > URL: https://issues.apache.org/jira/browse/YARN-5501 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Arun Suresh > Assignee: Hitesh Sharma > Attachments: Container Pooling in YARN.pdf, Container Pooling - one > pager.pdf > > > This JIRA proposes a method for reducing the container launch latency in > YARN. It introduces a notion of pooling *Unattached Pre-Initialized > Containers*. > Proposal in brief: > * Have a *Pre-Initialized Container Factory* service within the NM to create > these unattached containers. > * The NM would then advertise these containers as special resource types > (this should be possible via YARN-3926). > * When a start container request is received by the node manager for > launching a container requesting this specific type of resource, it will take > one of these unattached pre-initialized containers from the pool, and use it > to service the container request. > * Once the request is complete, the pre-initialized container would be > released and ready to serve another request. > This capability would help reduce container launch latencies and thereby > allow for development of more interactive applications on YARN. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org