[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

Vinod Kumar Vavilapalli (JIRA) Tue, 26 Mar 2013 15:59:17 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614650#comment-13614650
 ]


Vinod Kumar Vavilapalli commented on YARN-112:
----------------------------------------------

Bobby, I too have seen in large clusters/jobs - the law of large numbers :) We 
don't see the random number generator.

HADOOP-9438 will help, but I think instead of this solution, avoiding the race 
altogether by generating the destination path deterministically unique is a 
better solution. Something like localizer_id + random_num is a better 
destination path than plain random number.
                
> Race in localization can cause containers to fail
> -------------------------------------------------
>
>                 Key: YARN-112
>                 URL: https://issues.apache.org/jira/browse/YARN-112
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 0.23.3
>            Reporter: Jason Lowe
>            Assignee: omkar vinit joshi
>         Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch, 
> yarn-112.20131503.patch
>
>
> On one of our 0.23 clusters, I saw a case of two containers, corresponding to 
> two map tasks of a MR job, that were launched almost simultaneously on the 
> same node.  It appears they both tried to localize job.jar and job.xml at the 
> same time.  One of the containers failed when it couldn't rename the 
> temporary job.jar directory to its final name because the target directory 
> wasn't empty.  Shortly afterwards the second container failed because job.xml 
> could not be found, presumably because the first container removed it when it 
> cleaned up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-112) Race in localization can cause containers to fail

Reply via email to