[jira] [Comment Edited] (YARN-1593) support out-of-proc AuxiliaryServices

Haibo Chen (JIRA) Wed, 30 Nov 2016 15:26:16 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710178#comment-15710178
 ]


Haibo Chen edited comment on YARN-1593 at 11/30/16 11:25 PM:
-------------------------------------------------------------

Thanks for starting the work on this, [~vvasudev]!

I’d like to understand the proposal better. A few comments/questions on the 
proposal. Please correct me as necessary. 

It seems like system containers are overloaded in the design doc.  From a NM’s 
perspective, my understanding is that system containers are special container 
runtime (relative to the container types we have today in NM) provided by NM to 
be used by system services to run their components/instances. In other cases, 
system containers represent components/instances of system services on the 
worker nodes.  In the former case, we may only need to be concerned with issues 
such as classpath and container executors. For ShuffleHandler for instance, it 
is an alternative of the in-process runtime it gets from NM today. The latter, 
is where we discuss whether RM or NM does the heavy-lifting of managing system 
containers.

As you mention, no one option suits all use cases. Option 1 suits some, while 
option 3 suits others. I wonder if this is because we are conflating two 
different types of containers in the proposal - (1) framework-specific services 
like MR shuffle, and (2) application-specific services. Framework services are 
to be run on all nodes that support the framework (e.g. MR). Since these run on 
every node, node-level configs (option 3) would work best. Application-services 
(e.g. ATS AM-companion-collector), on the other hand, are application specific 
and need to run on a subset of cluster nodes; option 1 readily applies to 
these.  Is this categorization accurate? And, do you see merit in 
differentiating between these two?
bq. Allow shuffle to run on the NodeManagers without requiring it to be setup 
as an AuxiliaryService
Not sure if I understand this correctly, IHO, we could let the user continue 
with their current configuration for AuxiliaryService, but just run them in 
containers with AuxiliaryService proxy like Junping said in the jira 
description.
bq. Handling container status for system-containers - we will need to add logic 
to not act upon the container status of a system-container.
Can you please elaborate more on this? Shouldn’t NM try to relaunch system 
containers? Does this mean that RM will take the responsibility of handling 
system container failures?
bq. I think discovery is going to be one major piece that needs to be addressed 
from the beginning
Agree with Sangjin that discovery problem needs to be addressed right at the 
beginning. For option 3, I think we can add a queryable registry in 
AuxiliaryServices when NM launches a proxied AuxiliaryService assuming that NM 
will launch the AuxiliaryServices in the right order and each AuxiliaryService 
knows its dependent services.
bq. the NodeManager will block container requests until all the 
system-containers are running
With global scheduling and resource affinity, NM does not necessarily need to 
block container launching. NM can launch system containers asynchronously and 
report to resource manager upon launch success, and RM can only schedule 
containers on those nodes if the services that the containers depend on have 
been launched on those nodes.  But that’s way in the future I guess
bq.  We can’t solve the dependency management and affinity/anti-affinity 
requirements. (One of cons in option 3)
Not quite sure how option 1 solves the affinity requirement. Can you elaborate 
a little more on this?  To solve the dependency management issue, one thing 
that occurred to me, but I have not thought about in much details, is, we could 
have RM manages all system services together and construct a DAG of system 
services that need to be launched on each NM. Alternatively, RM can just decide 
what services need to be launched on which nodes with their dependency clearly 
defined, and then NM can construct the DAG themselves and launches them in 
topological order. This however, does put some burden on RM.


was (Author: haibochen):
Thanks for starting the work on this, Varun Vasudev!

I’d like to understand the proposal better. A few comments/questions on the 
proposal. Please correct me as necessary. 

It seems like system containers are overloaded in the design doc.  From a NM’s 
perspective, my understanding is that system containers are special container 
runtime (relative to the container types we have today in NM) provided by NM to 
be used by system services to run their components/instances. In other cases, 
system containers represent components/instances of system services on the 
worker nodes.  In the former case, we may only need to be concerned with issues 
such as classpath and container executors. For ShuffleHandler for instance, it 
is an alternative of the in-process runtime it gets from NM today. The latter, 
is where we discuss whether RM or NM does the heavy-lifting of managing system 
containers.

As you mention, no one option suits all use cases. Option 1 suits some, while 
option 3 suits others. I wonder if this is because we are conflating two 
different types of containers in the proposal - (1) framework-specific services 
like MR shuffle, and (2) application-specific services. Framework services are 
to be run on all nodes that support the framework (e.g. MR). Since these run on 
every node, node-level configs (option 3) would work best. Application-services 
(e.g. ATS AM-companion-collector), on the other hand, are application specific 
and need to run on a subset of cluster nodes; option 1 readily applies to 
these.  Is this categorization accurate? And, do you see merit in 
differentiating between these two?
bq. Allow shuffle to run on the NodeManagers without requiring it to be setup 
as an AuxiliaryService
Not sure if I understand this correctly, IHO, we could let the user continue 
with their current configuration for AuxiliaryService, but just run them in 
containers with AuxiliaryService proxy like Junping said in the jira 
description.
bq. Handling container status for system-containers - we will need to add logic 
to not act upon the container status of a system-container.
Can you please elaborate more on this? Shouldn’t NM try to relaunch system 
containers? Does this mean that RM will take the responsibility of handling 
system container failures?
bq. I think discovery is going to be one major piece that needs to be addressed 
from the beginning
Agree with Sangjin that discovery problem needs to be addressed right at the 
beginning. For option 3, I think we can add a queryable registry in 
AuxiliaryServices when NM launches a proxied AuxiliaryService assuming that NM 
will launch the AuxiliaryServices in the right order and each AuxiliaryService 
knows its dependent services.
bq. the NodeManager will block container requests until all the 
system-containers are running
With global scheduling and resource affinity, NM does not necessarily need to 
block container launching. NM can launch system containers asynchronously and 
report to resource manager upon launch success, and RM can only schedule 
containers on those nodes if the services that the containers depend on have 
been launched on those nodes.  But that’s way in the future I guess
bq.  We can’t solve the dependency management and affinity/anti-affinity 
requirements. (One of cons in option 3)
Not quite sure how option 1 solves the affinity requirement. Can you elaborate 
a little more on this?  To solve the dependency management issue, one thing 
that occurred to me, but I have not thought about in much details, is, we could 
have RM manages all system services together and construct a DAG of system 
services that need to be launched on each NM. Alternatively, RM can just decide 
what services need to be launched on which nodes with their dependency clearly 
defined, and then NM can construct the DAG themselves and launches them in 
topological order. This however, does put some burden on RM.

> support out-of-proc AuxiliaryServices
> -------------------------------------
>
>                 Key: YARN-1593
>                 URL: https://issues.apache.org/jira/browse/YARN-1593
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager, rolling upgrade
>            Reporter: Ming Ma
>            Assignee: Varun Vasudev
>         Attachments: SystemContainersandSystemServices.pdf
>
>
> AuxiliaryServices such as ShuffleHandler currently run in the same process as 
> NM. There are some benefits to host them in dedicated processes.
> 1. NM rolling restart. If we want to upgrade YARN , NM restart will force the 
> ShuffleHandler restart. If ShuffleHandler runs as a separate process, 
> ShuffleHandler can continue to run during NM restart. NM can reconnect the 
> the running ShuffleHandler after restart.
> 2. Resource management. It is possible another type of AuxiliaryServices will 
> be implemented. AuxiliaryServices are considered YARN application specific 
> and could consume lots of resources. Running AuxiliaryServices in separate 
> processes allow easier resource management. NM could potentially stop a 
> specific AuxiliaryServices process from running if it consumes resource way 
> above its allocation.
> Here are some high level ideas:
> 1. NM provides a hosting process for each AuxiliaryService. Existing 
> AuxiliaryService API doesn't change.
> 2. The hosting process provides RPC server for AuxiliaryService proxy object 
> inside NM to connect to.
> 3. When we rolling restart NM, the existing AuxiliaryService processes will 
> continue to run. NM could reconnect to the running AuxiliaryService processes 
> upon restart.
> 4. Policy and resource management of AuxiliaryServices. So far we don't have 
> immediate need for this. AuxiliaryService could run inside a container and 
> its resource utilization could be taken into account by RM and RM could 
> consider a specific type of applications overutilize cluster resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (YARN-1593) support out-of-proc AuxiliaryServices

Reply via email to