[ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665094#comment-15665094
 ] 

Hitesh Shah commented on YARN-1593:
-----------------------------------

Thanks [~vvasudev]. It does so partially. 

My concern is around the feedback loop in terms of failure handling by the apps 
when the system container dies at any of the following points: 
  - system container dies before an allocated container is launched on that node
  - it dies while a container is running
  - it dies after a container has completed

Would applications that define affinity to these system services now be getting 
updates (notifications) when system service containers go down or come back up? 
 

In addition to the feedback loop, is there any behavior change as a result of 
this? i.e. if the system container is not alive, will the app container still 
get launched given that its dependent service is down ( for shuffle, this might 
be ok if the system container eventually comes up but there might be other 
services that provide more synchronous functionality such as a caching layer? 

> support out-of-proc AuxiliaryServices
> -------------------------------------
>
>                 Key: YARN-1593
>                 URL: https://issues.apache.org/jira/browse/YARN-1593
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, rolling upgrade
>            Reporter: Ming Ma
>            Assignee: Varun Vasudev
>         Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to