[ https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208594#comment-15208594 ]
Varun Vasudev commented on YARN-1040: ------------------------------------- Thanks for putting up the proposal [~asuresh]! bq. "ContainerId" becomes "AllocationId" Is AllocationId a new class that we will introduce or a rename of the existing ContainerId class? In either case we have some issues to sort out - the first one won't be backward compatible and in the second case, will the NM generate container ids for the individual containers? bq. An AM can receive only a single allocation on a Node, The Scheduler will "bundle" all Allocations on a Node for an app into a single Large Allocation. Can you explain why we need this restriction? bq. Each Container is tagged with a "ContainerId" which is known only to the AM. Are you referring to the current ContainerId class? If yes, why is it known only to the AM? I actually agree with both Vinod and Bikas. The current approach is a little disruptive and not very useful for existing apps. I think we should separate out allocations work into their own classes on the RM and the NM with new APIs added for the RM and the NM. I don't think we can get away with modifying the existing APIs, the one exception being the allocate call, where we can add an additional flag to indicate whether an allocation or a container is desired. Internally, we can change the implementation to have the container model use allocations but I think allocations will have to have their own state machine withe slightly different semantics than containers(on both the RM and NM). > De-link container life cycle from an Allocation > ----------------------------------------------- > > Key: YARN-1040 > URL: https://issues.apache.org/jira/browse/YARN-1040 > Project: Hadoop YARN > Issue Type: Sub-task > Components: nodemanager > Affects Versions: 3.0.0 > Reporter: Steve Loughran > Attachments: YARN-1040-rough-design.pdf > > > The AM should be able to exec >1 process in a container, rather than have the > NM automatically release the container when the single process exits. > This would let an AM restart a process on the same container repeatedly, > which for HBase would offer locality on a restarted region server. > We may also want the ability to exec multiple processes in parallel, so that > something could be run in the container while a long-lived process was > already running. This can be useful in monitoring and reconfiguring the > long-lived process, as well as shutting it down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)