[ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15663162#comment-15663162
 ] 

Varun Vasudev commented on YARN-1593:
-------------------------------------

Good question [~hitesh] - this is one piece where we're looking for feedback on 
the approach.

bq. The doc does not seem to cover how user applications can define 
dependencies on these system services. For example, how to ensure that an 
MR/Tez/xyz container that requires the shuffle service does not get launched on 
a node where the system service is not running. This has 2 aspects - firstly 
how to ensure container allocations happen on correct nodes where these 
services are running and secondly, the service might be down when the container 
actually gets launched and therefore how the behavior will change as a result ( 
does the container eventually fail, does the NM itself stop the launch of the 
container and send an error back, etc).

There are two modes to system containers and services - and I suspect we need a 
hybrid mode. The first mode is to launch them as YARN services(e.g. Tez shuffle 
service). Tez would then add an affinity requirement between the containers it 
launches and the Tez shuffle service containers. This would require changes on 
both the application and YARN level. The second mode is to launch Tez shuffle 
on all nodes (like we do with auxiliary services today) as "system" containers 
which are managed by YARN. The NMs will not accept container requests until the 
system containers are up and running. In this mode - Tez requires no change at 
all - since the Tez shuffle is running on every container.

Does that answer your question?


> support out-of-proc AuxiliaryServices
> -------------------------------------
>
>                 Key: YARN-1593
>                 URL: https://issues.apache.org/jira/browse/YARN-1593
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, rolling upgrade
>            Reporter: Ming Ma
>            Assignee: Varun Vasudev
>         Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to