[ https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560735#comment-14560735 ]
Varun Saxena commented on YARN-3678: ------------------------------------ Yeah that's why I said if we can increase value of {{pid_max}} on a 64-bit machine to highest value it can take i.e. 2^22, that should mitigate the risk of this happening. But anyways, as I mentioned above, we can fix this though. > DelayedProcessKiller may kill other process other than container > ---------------------------------------------------------------- > > Key: YARN-3678 > URL: https://issues.apache.org/jira/browse/YARN-3678 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: gu-chi > Priority: Critical > > Suppose one container finished, then it will do clean up, the PID file still > exist and will trigger once singalContainer, this will kill the process with > the pid in PID file, but as container already finished, so this PID may be > occupied by other process, this may cause serious issue. > As I know, my NM was killed unexpectedly, what I described can be the cause. > Even rarely occur. -- This message was sent by Atlassian JIRA (v6.3.4#6332)