[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667638#comment-15667638 ]
Varun Vasudev commented on YARN-1593: ------------------------------------- [~asuresh] - {quote} Thanks for driving this Varun Vasudev At first glance, this looks similar in spirit to YARN-5501, and maybe even supersedes it. It would be advantageous to model pooled containers as a system container. Further to the point raised by Hitesh Shah about formalizing how we affinitize an application's container to a Node on a which a dependent system container is run, we were also investigating a scenario where an application might also need a countable number of system containers on a Node. An initial thought was to probably expose the container as a Generalized resource (YARN-3926). For eg, assume spark Executors can be started as Pre-started containers on select nodes. Assume a node A has 2 pre-started spark executors, and Node B has 4. A spark app might have 3 ContainerRequests that requires <4 VCores, 2 GB, 2 spark-executors>, in which case the ResourceManager will ensure that 1 such container is allocated on Node A and 2 on Node B. Thoughts ? {quote} I think there's quite a bit of overlap. Couple of questions about pooled containers - 1) If they fail to come up should the NM continue to accept container requests so should it stop accepting container requests? 2) Are they meant to run on a subset of nodes or on all nodes? Is this controlled by an admin? Like I mentioned to Hitesh above - the affinity stuff is something we think is the long term solution, but we also realize that a solution which is essentially "launch this container on every node" will help bridge the gap for now. Hence, the inclusion of both in the design doc. > support out-of-proc AuxiliaryServices > ------------------------------------- > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade > Reporter: Ming Ma > Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org