[ https://issues.apache.org/jira/browse/YARN-5987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15734060#comment-15734060 ]
Miklos Szegedi commented on YARN-5987: -------------------------------------- The way I would implement this is to let the administrator specify NM_SAVE_DEBUG_INFO_COMMAND and NM_SAVE_DEBUG_INFO_TIMEOUT_SEC. The command is called, when a container is preempted. If the timer expires before the command finishes, the command is cancelled. The command can have {{PID}}, and {{LOG_DIR}} replaced with the actual values. The container executor needs to impersonate, in case YARN is running as a different user than the container. The ideal solution also specifies a flag in the container launch context, whether to apply the feature to the current running application, so that we do not collect dumps for all applications unnecessarily. > NM configured command to collect heap dump of preempted container > ----------------------------------------------------------------- > > Key: YARN-5987 > URL: https://issues.apache.org/jira/browse/YARN-5987 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Miklos Szegedi > Assignee: Miklos Szegedi > > The node manager can kill a container, if it exceeds the assigned memory > limits. It would be nice to have a configuration entry to set up a command > that can collect additional debug information, if needed. The collected > information can be used for root cause analysis. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org