[ 
https://issues.apache.org/jira/browse/YARN-445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795753#comment-13795753
 ] 

Andrey Klochkov commented on YARN-445:
--------------------------------------

Vinod,
Accepting a mapping of arbitrary commands is indeed the most powerful approach. 
Also, this would require lots of changes in the Yarn, as well as an additional 
complexity for app writers. At the same time, are we sure that this flexibility 
is needed, and it won't be an over-engineering and probably an abstraction leak 
in the Yarn framework? By the latter I mean that we will give app writers an 
ability to run arbitrary commands on any node at any point of time, but is it 
in the Yarn responsibilities to do that? I'm not a Yarn expert so I'm just 
asking.

Anyway, the scope of what I has proposed with the patch is much smaller and 
solves the task the initial description of this Jira stated - troubleshooting 
of timed out containers by dumping jstack. This would be useful for many Yarn 
uses, so I thought it may make sense to implement it this way now and extend in 
the future if there is a demand. Agree that the way it is exposed in the API 
may be changed to a signal value in the stopContainers request instead of a 
separate call which is indeed a bit confusing.

> Ability to signal containers
> ----------------------------
>
>                 Key: YARN-445
>                 URL: https://issues.apache.org/jira/browse/YARN-445
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Jason Lowe
>            Assignee: Andrey Klochkov
>         Attachments: YARN-445--n2.patch, YARN-445--n3.patch, 
> YARN-445--n4.patch, YARN-445.patch
>
>
> It would be nice if an ApplicationMaster could send signals to contaniers 
> such as SIGQUIT, SIGUSR1, etc.
> For example, in order to replicate the jstack-on-task-timeout feature 
> implemented by MAPREDUCE-1119 in Hadoop 0.21 the NodeManager needs an 
> interface for sending SIGQUIT to a container.  For that specific feature we 
> could implement it as an additional field in the StopContainerRequest.  
> However that would not address other potential features like the ability for 
> an AM to trigger jstacks on arbitrary tasks *without* killing them.  The 
> latter feature would be a very useful debugging tool for users who do not 
> have shell access to the nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to