[ 
https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gu-chi resolved YARN-4536.
--------------------------
    Resolution: Not A Problem

As analyzed further, this is introduced by some custom modification, sorry if 
bother.

> DelayedProcessKiller may not work under heavy workload
> ------------------------------------------------------
>
>                 Key: YARN-4536
>                 URL: https://issues.apache.org/jira/browse/YARN-4536
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>            Reporter: gu-chi
>
> I am now facing with orphan process of container. Here is the scenario:
> With heavy task load, the NM machine CPU usage can reach almost 100%. When 
> some container got event of kill, it will get  {{SIGTERM}} , and then the 
> parent process exit, leave the container process to OS. This container 
> process need handle some shutdown events or some logic, but hardly can get 
> CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} 
> ,but the parent process which persisted as container pid no longer exist, so 
> the kill command can not reach the container process. This is how orphan 
> container process come.
> The orphan process do exit after some time, but the period can be very long, 
> and will make the OS status worse. As I observed, the period can be several 
> hours



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to