[ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668916#comment-15668916
 ] 

Konstantinos Karanasos commented on YARN-1593:
----------------------------------------------

Thanks for starting this! As [~asuresh] and [~hrsharma] pointed out, this is 
very related to the container pooling we have been thinking of, so it's great 
to see there is more work to this direction.

Here are some first thoughts:
- There seems to be a common need to have containers not belonging to an AM. I 
like your analysis about the pros and cons of the three approaches. Ideally, 
and if possible, it would be good to agree on an approach that is not hybrid, 
i.e., to not have some containers going through option (1) and some others 
through option (3), but rather have a unified approach. In container pooling we 
have thought of having a component in the RM that manages how many "system" 
containers will running at each node, but we are willing to adopt another 
approach if it is more suitable.
- Looking both at your document and the comments above, it seems that no 
approach can properly tackle the dependencies problem. Probably we should solve 
this in the scheduler: just like there will be support for (anti-)affinity 
constraints, we can add support for dependencies in the scheduler, e.g., to not 
schedule that container to a node before a shuffle container is running on that 
node.
- Although I like your proposal of using a new ExecutionType for the system 
containers, I am not sure it is always desirable to couple system containers 
with the highest priority ExecutionType. For instance, there can be system 
containers that are not as important and can be preempted to make space if 
needed. Also, apart from the execution priority, I am not sure if the 
ExecutionType should determine whether a container should be automatically 
relaunched. If we end up having a component managing those containers, maybe it 
is its role to determine if they get restarted upon failure (irrespective of 
their ExecutionType).

> support out-of-proc AuxiliaryServices
> -------------------------------------
>
>                 Key: YARN-1593
>                 URL: https://issues.apache.org/jira/browse/YARN-1593
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, rolling upgrade
>            Reporter: Ming Ma
>            Assignee: Varun Vasudev
>         Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to