[ https://issues.apache.org/jira/browse/YARN-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875625#comment-13875625 ]
Michael Lv commented on YARN-1609: ---------------------------------- Hi [~hitesh], thanks for the detailed comments. The essence of the proposal actually is to have server process be able to be managed by NodeManager, and in the mean time allow AM/RM to control/monitor resource consumption via existing AM-NM, and AM-RM protocol with minimal backwards compatible extensions. In many frameworks(e.g OMPI/ORTE)the server process running on each node manages it's child processes internally and that's hard to change. >From performance/scaleability perspective, since it's scenario dependent IMO, >I'll comment on your questions in Hamster case: {quote}Is the NM flow too slow for launching a new container or killing one?{quote} Today there is no difference in containers, they go thru the same life cycle and are independent from each other - it's more like running many tasks directed by AM(MRv2 framework comes to mind). If the task containers could be managed by server containers(slave daemon in Hamster, or TaskTracker in MR1?), the server daemon can launch processes (in batch or on demand) directly and faster. So yes, in hamster case, it can be faster if we can bypass some steps(e.g container localization etc) but not much as per node the scale is small. We'll have more data points as this patch being implemented further in our work {quote}Is it difficult to monitoring of the status of a container?{quote} Yes, esp when the server container wants to control the lifecycle of the child process and sync on child process state - we are using shared storage for that purpose for now(a workaround, for larger number processes it can be a problem) {quote}Can the NM not handle launching/managing a large no. of containers on a single machine?{quote} Not really > Add Service Container type to NodeManager in YARN > ------------------------------------------------- > > Key: YARN-1609 > URL: https://issues.apache.org/jira/browse/YARN-1609 > Project: Hadoop YARN > Issue Type: Improvement > Components: nodemanager > Affects Versions: 2.2.0 > Reporter: Wangda Tan > Assignee: Wangda Tan > Attachments: Add Service Container type to NodeManager in YARN-V1.pdf > > > From our work to support running OpenMPI on YARN (MAPREDUCE-2911), we found > that it’s important to have framework specific daemon process manage the > tasks on each node directly. The daemon process, most likely similar in other > frameworks as well, provides critical services to tasks running on that > node(for example “wireup”, spawn user process in large numbers at once etc). > In YARN, it’s hard, if not possible, to have the those processes to be > managed by YARN. > We propose to extend the container model on NodeManager side to support > “Service Container” to run/manage such framework daemon/services process. We > believe this is very useful to other application framework developers as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)