[ https://issues.apache.org/jira/browse/YARN-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuqi Wang updated YARN-8012: ---------------------------- Description: An *unmanaged container / leaked container* is a container which is no longer managed by NM. Thus, it is cannot be managed / leaked by YARN, too. *There are many cases a YARN managed container can become unmanaged, such as:* * NM service is disabled or removed on the node. * NM is unable to start up again on the node, such as depended configuration, or resources cannot be ready. * NM local leveldb store is corrupted or lost, such as bad disk sectors. * NM has bugs, such as wrongly mark live container as complete. Note, they are caused or things become worse if work-preserving NM restart enabled, see YARN-1336 *Bad impacts of unmanaged container, such as:* # Resource cannot be managed for YARN on the node: ** Cause YARN on the node resource leak ** Cannot kill the container to release YARN resource on the node # Container and App killing is not eventually consistent for App user: ** App which has bugs can still produce bad impacts to outside even if the App is killed for a long time was: An *unmanaged container / leaked container* is a container which is no longer managed by NM. Thus, it is cannot be managed / leaked by YARN, too. *There are many cases a YARN managed container can become unmanaged, such as:* * NM service is disabled or removed on the node. * NM is unable to start up again on the node, such as depended configuration, or resources cannot be ready. * NM local leveldb store is corrupted or lost, such as bad disk sectors. * NM has bugs, such as wrongly mark live container as complete. Things become worse if work-preserving NM restart enabled, see YARN-1336 *Bad impacts of unmanaged container, such as:* # Resource cannot be managed for YARN on the node: ** Cause YARN on the node resource leak ** Cannot kill the container to release YARN resource on the node # Container and App killing is not eventually consistent for App user: ** App which has bugs can still produce bad impacts to outside even if the App is killed for a long time > Support Unmanaged Container Cleanup > ----------------------------------- > > Key: YARN-8012 > URL: https://issues.apache.org/jira/browse/YARN-8012 > Project: Hadoop YARN > Issue Type: New Feature > Components: nodemanager > Affects Versions: 2.7.1 > Reporter: Yuqi Wang > Assignee: Yuqi Wang > Priority: Major > Fix For: 2.7.1 > > Attachments: YARN-8012-branch-2.7.1.001.patch > > > An *unmanaged container / leaked container* is a container which is no longer > managed by NM. Thus, it is cannot be managed / leaked by YARN, too. > *There are many cases a YARN managed container can become unmanaged, such as:* > * NM service is disabled or removed on the node. > * NM is unable to start up again on the node, such as depended > configuration, or resources cannot be ready. > * NM local leveldb store is corrupted or lost, such as bad disk sectors. > * NM has bugs, such as wrongly mark live container as complete. > Note, they are caused or things become worse if work-preserving NM restart > enabled, see YARN-1336 > *Bad impacts of unmanaged container, such as:* > # Resource cannot be managed for YARN on the node: > ** Cause YARN on the node resource leak > ** Cannot kill the container to release YARN resource on the node > # Container and App killing is not eventually consistent for App user: > ** App which has bugs can still produce bad impacts to outside even if the > App is killed for a long time -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org