[ 
https://issues.apache.org/jira/browse/YARN-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14560651#comment-14560651
 ] 

Naganarasimha G R commented on YARN-3678:
-----------------------------------------

Hi [~vvasudev] & [~zhiguohong], For us it happened in secure setup and one key 
point is both the NM user and user of the container is same . But irrespective 
of this it could have killed any other process[container] for same/another app 
running in the same node, submitted by the same user. One suggestion(crude fix 
not sure how to get it working for other OS) is can we grep for the containerID 
and confirm its the same process we are targetting and then kill it  ? 

> DelayedProcessKiller may kill other process other than container
> ----------------------------------------------------------------
>
>                 Key: YARN-3678
>                 URL: https://issues.apache.org/jira/browse/YARN-3678
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: gu-chi
>            Priority: Critical
>
> Suppose one container finished, then it will do clean up, the PID file still 
> exist and will trigger once singalContainer, this will kill the process with 
> the pid in PID file, but as container already finished, so this PID may be 
> occupied by other process, this may cause serious issue.
> As I know, my NM was killed unexpectedly, what I described can be the cause. 
> Even rarely occur.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to