[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

Jason Lowe (JIRA) Thu, 14 Apr 2016 09:12:00 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241431#comment-15241431
 ]


Jason Lowe commented on YARN-4924:
----------------------------------

Thanks for updating the patch!

If createWriteBatch does ever throw the runtime DBException we want that 
translated to the IOException to avoid the exception bubbling up and becoming 
fatal to the NM.  Therefore the createWriteBatch call needs to be in the inner 
try that will translate DBException->IOException.  The sample code I wrote 
above should cover the cases.


> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>
>                 Key: YARN-4924
>                 URL: https://issues.apache.org/jira/browse/YARN-4924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: sandflee
>         Attachments: YARN-4924.01.patch, YARN-4924.02.patch, 
> YARN-4924.03.patch, YARN-4924.04.patch
>
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-4924) NM recovery race can lead to container not cleaned up

Reply via email to