[ 
https://issues.apache.org/jira/browse/YARN-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102189#comment-16102189
 ] 

Eric Badger commented on YARN-6846:
-----------------------------------

Patch looks good overall to me. I only have 1 potential concern, which is that 
the unit test doesn't actually force the race, just relies on enough iterations 
eventually hitting the race condition. However, in practice the test has always 
failed for me without the fix and passed with it so I think it's ok. Other than 
chameleon coding along with some weird existing style (e.g. {{ret == -ENOENT}} 
vs {{ret == ENOENT}}), everything else looks good. I'll give this my +1 
(non-binding)

> Nodemanager can fail to fully delete application local directories when 
> applications are killed
> -----------------------------------------------------------------------------------------------
>
>                 Key: YARN-6846
>                 URL: https://issues.apache.org/jira/browse/YARN-6846
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.1
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>         Attachments: YARN-6846.001.patch, YARN-6846.002.patch, 
> YARN-6846.003.patch
>
>
> When an application is killed all of the running containers are killed and 
> the app waits for the containers to complete before cleaning up.  As each 
> container completes the container directory is deleted via the 
> DeletionService.  After all containers have completed the app completes and 
> the app directory is deleted.  If the app completes quickly enough then the 
> deletion of the container and app directories can race against each other.  
> If the container deletion executor deletes a file just before the application 
> deletion executor then it can cause the application deletion executor to 
> fail, leaving the remaining entries in the application directory lingering.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to