[ https://issues.apache.org/jira/browse/YARN-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174496#comment-14174496 ]
Xuan Gong commented on YARN-2701: --------------------------------- To solve this problem, we need to change the native code: When mkdir call fails, we need to check the exception type. If the type is FileAlreadyExist, we should check whether the permission of the file is the same as the desired permission. If both of them are true, we should not fail the localization process. > Potential race condition in startLocalizer when using LinuxContainerExecutor > ------------------------------------------------------------------------------ > > Key: YARN-2701 > URL: https://issues.apache.org/jira/browse/YARN-2701 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Xuan Gong > Assignee: Xuan Gong > > When using LinuxContainerExecutor do startLocalizer, we are using native code > container-executor.c. > {code} > if (stat(npath, &sb) != 0) { > if (mkdir(npath, perm) != 0) { > {code} > We are using check and create method to create the appDir under /usercache. > But if there are two containers trying to do this at the same time, race > condition may happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)