[ 
https://issues.apache.org/jira/browse/YARN-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091628#comment-15091628
 ] 

Varun Vasudev edited comment on YARN-4459 at 1/11/16 9:04 AM:
--------------------------------------------------------------

[~hex108] - assume that the process with the recycled pid does a setsid call - 
then the process group check will succeed and we might still end up killing the 
wrong process, no? 

The assumptions of the patch are 
1) The new process will not belong to the same user and 
2) The new process has not called setsid

Correct?

I suspect we might need to add a timing check similar to the one proposed in 
https://issues.apache.org/jira/browse/YARN-3678?focusedCommentId=14560578&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14560578


was (Author: vvasudev):
[~hex108] - assume that the process with the recycled pid does a setsid call - 
then the process group check will succeed and we might still end up killing the 
wrong process, no? 

The assumptions of the patch are 
1) The new process will not belong to the same user and 
2) The new process has not called setsid

Correct?

> container-executor might kill process wrongly
> ---------------------------------------------
>
>                 Key: YARN-4459
>                 URL: https://issues.apache.org/jira/browse/YARN-4459
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>         Attachments: YARN-4459.01.patch, YARN-4459.02.patch
>
>
> When calling 'signal_container_as_user' in container-executor, it first 
> checks whether process group exists, if not, it will kill the process 
> itself(if it the process exists).  It is not reasonable because that the 
> process group does not exist means corresponding container has finished, if 
> we kill the process itself, we just kill wrong process.
> We found it happened in our cluster many times. We used same account for 
> starting NM and submitted app, and container-executor sometimes killed NM(the 
> wrongly killed process might just be a newly started thread and was NM's 
> child process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to