[
https://issues.apache.org/jira/browse/YARN-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616325#comment-13616325
]
Robert Joseph Evans commented on YARN-112:
------------------------------------------
I agree that scale exposes races but, still the underlying problem is that we
want to create a new unique directory. This seems very simple.
{code}
File uniqueDir = null;
do {
uniqueDir = new File(baseDir, String.valueOf(rand.nextLong()));
} while (!uniqueDir.mkdir());
{code}
I don't see why we are going through all of this complexity, simply because a
FileContext API is broken. Playing games to make the race less likely is fine.
But ultimately we still have to handle the race.
> Race in localization can cause containers to fail
> -------------------------------------------------
>
> Key: YARN-112
> URL: https://issues.apache.org/jira/browse/YARN-112
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 0.23.3
> Reporter: Jason Lowe
> Assignee: Omkar Vinit Joshi
> Attachments: yarn-112-20130325.1.patch, yarn-112-20130325.patch,
> yarn-112-20130326.patch, yarn-112.20131503.patch
>
>
> On one of our 0.23 clusters, I saw a case of two containers, corresponding to
> two map tasks of a MR job, that were launched almost simultaneously on the
> same node. It appears they both tried to localize job.jar and job.xml at the
> same time. One of the containers failed when it couldn't rename the
> temporary job.jar directory to its final name because the target directory
> wasn't empty. Shortly afterwards the second container failed because job.xml
> could not be found, presumably because the first container removed it when it
> cleaned up.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira