[ 
https://issues.apache.org/jira/browse/YARN-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139927#comment-14139927
 ] 

Hadoop QA commented on YARN-2566:
---------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669893/YARN-2566.000.patch
  against trunk revision 6434572.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager.

    {color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/5037//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/5037//artifact/PreCommit-HADOOP-Build-patchprocess/newPatchFindbugsWarningshadoop-yarn-server-nodemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5037//console

This message is automatically generated.

> IOException happen in startLocalizer of DefaultContainerExecutor due to not 
> enough disk space for the first localDir.
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2566
>                 URL: https://issues.apache.org/jira/browse/YARN-2566
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.5.0
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: YARN-2566.000.patch
>
>
> startLocalizer in DefaultContainerExecutor will only use the first localDir 
> to copy the token file, if the copy is failed for first localDir due to not 
> enough disk space in the first localDir, the localization will be failed even 
> there are plenty of disk space in other localDirs. We see the following error 
> for this case:
> {code}
> 2014-09-13 23:33:25,171 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Unable to 
> create app directory 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004
> java.io.IOException: mkdir of 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 failed
>       at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1062)
>       at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:157)
>       at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
>       at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:721)
>       at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:717)
>       at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>       at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:717)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:426)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppDirs(DefaultContainerExecutor.java:522)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:94)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,185 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Localizer failed
> java.io.FileNotFoundException: File 
> file:/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004 
> does not exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:511)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:724)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:501)
>       at 
> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
>       at 
> org.apache.hadoop.fs.DelegateToFileSystem.createInternal(DelegateToFileSystem.java:76)
>       at 
> org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.<init>(ChecksumFs.java:344)
>       at org.apache.hadoop.fs.ChecksumFs.createInternal(ChecksumFs.java:390)
>       at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
>       at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:677)
>       at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:673)
>       at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>       at org.apache.hadoop.fs.FileContext.create(FileContext.java:673)
>       at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2021)
>       at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:1963)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:987)
> 2014-09-13 23:33:25,186 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_000001 transitioned from 
> LOCALIZING to LOCALIZATION_FAILED
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=cloudera   
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl    
> RESULT=FAILURE  DESCRIPTION=Container failed with state: LOCALIZATION_FAILED  
>   APPID=application_1410663092546_0004    
> CONTAINERID=container_1410663092546_0004_01_000001
> 2014-09-13 23:33:25,187 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1410663092546_0004_01_000001 transitioned from 
> LOCALIZATION_FAILED to DONE
> 2014-09-13 23:33:25,187 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Removing container_1410663092546_0004_01_000001 from application 
> application_1410663092546_0004
> 2014-09-13 23:33:25,187 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
>  Considering container container_1410663092546_0004_01_000001 for 
> log-aggregation
> 2014-09-13 23:33:25,187 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event CONTAINER_STOP for appId application_1410663092546_0004
> 2014-09-13 23:33:25,187 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
> absolute path : 
> /hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001
> 2014-09-13 23:33:25,187 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete 
> returned false for path: 
> [/hadoop/d1/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001]
> 2014-09-13 23:33:25,188 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
> absolute path : 
> /hadoop/d2/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001
> 2014-09-13 23:33:25,188 WARN 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete 
> returned false for path: 
> [/hadoop/d2/usercache/cloudera/appcache/application_1410663092546_0004/container_1410663092546_0004_01_000001]
> 2014-09-13 23:33:25,291 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Stopping resource-monitoring for container_1410663092546_0004_01_000001
> 2014-09-13 23:33:26,159 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed container container_1410663092546_0004_01_000001
> {code}
> The correct way to do is If the IOException happened during the copy, try the 
> next the localDir, If all the localDirs are failed to copy, then throw a 
> exception. 
> I will create a patch to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to