[ https://issues.apache.org/jira/browse/YARN-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
gu-chi resolved YARN-4536. -------------------------- Resolution: Not A Problem As analyzed further, this is introduced by some custom modification, sorry if bother. > DelayedProcessKiller may not work under heavy workload > ------------------------------------------------------ > > Key: YARN-4536 > URL: https://issues.apache.org/jira/browse/YARN-4536 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.7.1 > Reporter: gu-chi > > I am now facing with orphan process of container. Here is the scenario: > With heavy task load, the NM machine CPU usage can reach almost 100%. When > some container got event of kill, it will get {{SIGTERM}} , and then the > parent process exit, leave the container process to OS. This container > process need handle some shutdown events or some logic, but hardly can get > CPU, we suppose to see a {{SIGKILL}} as there is {{DelayedProcessKiller}} > ,but the parent process which persisted as container pid no longer exist, so > the kill command can not reach the container process. This is how orphan > container process come. > The orphan process do exit after some time, but the period can be very long, > and will make the OS status worse. As I observed, the period can be several > hours -- This message was sent by Atlassian JIRA (v6.3.4#6332)