[ 
https://issues.apache.org/jira/browse/YARN-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734060#comment-15734060
 ] 

Miklos Szegedi commented on YARN-5987:
--------------------------------------

The way I would implement this is to let the administrator specify 
NM_SAVE_DEBUG_INFO_COMMAND and NM_SAVE_DEBUG_INFO_TIMEOUT_SEC. The command is 
called, when a container is preempted. If the timer expires before the command 
finishes, the command is cancelled. The command can have {{PID}}, and 
{{LOG_DIR}} replaced with the actual values. The container executor needs to 
impersonate, in case YARN is running as a different user than the container. 
The ideal solution also specifies a flag in the container launch context, 
whether to apply the feature to the current running application, so that we do 
not collect dumps for all applications unnecessarily.

> NM configured command to collect heap dump of preempted container
> -----------------------------------------------------------------
>
>                 Key: YARN-5987
>                 URL: https://issues.apache.org/jira/browse/YARN-5987
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
>
> The node manager can kill a container, if it exceeds the assigned memory 
> limits. It would be nice to have a configuration entry to set up a command 
> that can collect additional debug information, if needed. The collected 
> information can be used for root cause analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to