[ 
https://issues.apache.org/jira/browse/YARN-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated YARN-6846:
-----------------------------
    Attachment: YARN-6846.001.patch

Attaching a patch that makes the container-executor more tolerant of paths 
being already deleted when trying to delete a hierarchy.  It also changes the 
deletion code to be best-effort by attempting to delete other entries even if 
unlinking one of the entries encountered an error.


> Nodemanager can fail to fully delete application local directories when 
> applications are killed
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-6846
>                 URL: https://issues.apache.org/jira/browse/YARN-6846
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.1
>            Reporter: Jason Lowe
>            Priority: Critical
>         Attachments: YARN-6846.001.patch
>
>
> When an application is killed all of the running containers are killed and 
> the app waits for the containers to complete before cleaning up.  As each 
> container completes the container directory is deleted via the 
> DeletionService.  After all containers have completed the app completes and 
> the app directory is deleted.  If the app completes quickly enough then the 
> deletion of the container and app directories can race against each other.  
> If the container deletion executor deletes a file just before the application 
> deletion executor then it can cause the application deletion executor to 
> fail, leaving the remaining entries in the application directory lingering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to