[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

Varun Vasudev (JIRA) Wed, 23 Mar 2016 08:29:05 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208594#comment-15208594
 ]


Varun Vasudev commented on YARN-1040:
-------------------------------------

Thanks for putting up the proposal [~asuresh]! 

bq. "ContainerId" becomes "AllocationId"
Is AllocationId a new class that we will introduce or a rename of the existing 
ContainerId class? In either case we have some issues to sort out - the first 
one won't be backward compatible and in the second case, will the NM generate 
container ids for the individual containers?

bq. An AM can receive only a single allocation on a Node, The Scheduler will 
"bundle" all Allocations on a Node for an app into a single Large Allocation.
Can you explain why we need this restriction?

bq. Each Container is tagged with a "ContainerId" which is known only to the AM.
Are you referring to the current ContainerId class? If yes, why is it known 
only to the AM?

I actually agree with both Vinod and Bikas. The current approach is a little 
disruptive and not very useful for existing apps. I think we should separate 
out allocations work into their own classes on the RM and the NM with new APIs 
added for the RM and the NM. I don't think we can get away with modifying the 
existing APIs, the one exception being the allocate call, where we can add an 
additional flag to indicate whether an allocation or a container is desired. 
Internally, we can change the implementation to have the container model use 
allocations but I think allocations will have to have their own state machine 
withe slightly different semantics than containers(on both the RM and NM). 

> De-link container life cycle from an Allocation
> -----------------------------------------------
>
>                 Key: YARN-1040
>                 URL: https://issues.apache.org/jira/browse/YARN-1040
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>         Attachments: YARN-1040-rough-design.pdf
>
>
> The AM should be able to exec >1 process in a container, rather than have the 
> NM automatically release the container when the single process exits.
> This would let an AM restart a process on the same container repeatedly, 
> which for HBase would offer locality on a restarted region server.
> We may also want the ability to exec multiple processes in parallel, so that 
> something could be run in the container while a long-lived process was 
> already running. This can be useful in monitoring and reconfiguring the 
> long-lived process, as well as shutting it down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-1040) De-link container life cycle from an Allocation

Reply via email to