[ https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668916#comment-15668916 ]
Konstantinos Karanasos commented on YARN-1593: ---------------------------------------------- Thanks for starting this! As [~asuresh] and [~hrsharma] pointed out, this is very related to the container pooling we have been thinking of, so it's great to see there is more work to this direction. Here are some first thoughts: - There seems to be a common need to have containers not belonging to an AM. I like your analysis about the pros and cons of the three approaches. Ideally, and if possible, it would be good to agree on an approach that is not hybrid, i.e., to not have some containers going through option (1) and some others through option (3), but rather have a unified approach. In container pooling we have thought of having a component in the RM that manages how many "system" containers will running at each node, but we are willing to adopt another approach if it is more suitable. - Looking both at your document and the comments above, it seems that no approach can properly tackle the dependencies problem. Probably we should solve this in the scheduler: just like there will be support for (anti-)affinity constraints, we can add support for dependencies in the scheduler, e.g., to not schedule that container to a node before a shuffle container is running on that node. - Although I like your proposal of using a new ExecutionType for the system containers, I am not sure it is always desirable to couple system containers with the highest priority ExecutionType. For instance, there can be system containers that are not as important and can be preempted to make space if needed. Also, apart from the execution priority, I am not sure if the ExecutionType should determine whether a container should be automatically relaunched. If we end up having a component managing those containers, maybe it is its role to determine if they get restarted upon failure (irrespective of their ExecutionType). > support out-of-proc AuxiliaryServices > ------------------------------------- > > Key: YARN-1593 > URL: https://issues.apache.org/jira/browse/YARN-1593 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager, rolling upgrade > Reporter: Ming Ma > Assignee: Varun Vasudev > Attachments: SystemContainersandSystemServices.pdf > > > AuxiliaryServices such as ShuffleHandler currently run in the same process as > NM. There are some benefits to host them in dedicated processes. > 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the > ShuffleHandler restart. If ShuffleHandler runs as a separate process, > ShuffleHandler can continue to run during NM restart. NM can reconnect the > the running ShuffleHandler after restart. > 2. Resource management. It is possible another type of AuxiliaryServices will > be implemented. AuxiliaryServices are considered YARN application specific > and could consume lots of resources. Running AuxiliaryServices in separate > processes allow easier resource management. NM could potentially stop a > specific AuxiliaryServices process from running if it consumes resource way > above its allocation. > Here are some high level ideas: > 1. NM provides a hosting process for each AuxiliaryService. Existing > AuxiliaryService API doesn't change. > 2. The hosting process provides RPC server for AuxiliaryService proxy object > inside NM to connect to. > 3. When we rolling restart NM, the existing AuxiliaryService processes will > continue to run. NM could reconnect to the running AuxiliaryService processes > upon restart. > 4. Policy and resource management of AuxiliaryServices. So far we don't have > immediate need for this. AuxiliaryService could run inside a container and > its resource utilization could be taken into account by RM and RM could > consider a specific type of applications overutilize cluster resource. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org