[ 
https://issues.apache.org/jira/browse/YARN-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567843#comment-13567843
 ] 

Chris Nauroth commented on YARN-367:
------------------------------------

Hi, Zhijie.  I was able to reproduce this bug.  When overriding hadoop.tmp.dir, 
typical usage is to specify it in core-site.xml instead of hdfs-site.xml, so 
that all of the Hadoop processes receive the new value.  After I moved my 
configuration of hadoop.tmp.dir to core-site.xml, I stopped seeing this bug.

When I repro'd, I noticed that localization is attempting to use the default 
local dir for usercache, but then container launch is trying to use the local 
dir I configured in hdfs-site.xml.  Therefore, container launch doesn't find 
the container working directory in the place it expects:

2013-01-31 09:02:28,623 INFO  nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:startLocalizer(101)) - CWD set to 
/tmp/hadoop-chris/nm-local-dir/usercache/chris/appcache/application_1359651644148_0002
 = 
file:/tmp/hadoop-chris/nm-local-dir/usercache/chris/appcache/application_1359651644148_0002

2013-01-31 09:02:29,922 WARN  launcher.ContainerLaunch 
(ContainerLaunch.java:call(247)) - Failed to launch container.
java.io.FileNotFoundException: File 
/Users/chris/hadoop-deploy-trunk/hadoop-3.0.0-SNAPSHOT/data/nm-local-dir/usercache/chris/appcache/application_1359651644148_0002/container_1359651644148_0002_01_000001
 does not exist

My theory is that configuration is getting loaded in 2 different ways for 
localization and container launch.  The configuration for localization is not 
loading hdfs-site.xml, but the configuration for container launch is loading 
hdfs-site.xml, so the 2 pieces are seeing different configurations.  I'm not 
sure if YARN daemons should be loading hdfs-site.xml.  Whatever the choice, it 
probably should be consistent throughout the code.

This would be good to track down eventually, but for now, I expect you can 
quickly fix your environment by moving your hadoop.tmp.dir configuration into 
core-site.xml.  I hope this helps!

                
> Exception when yarn.nodemanager.local-dirs is not explicitly set
> ----------------------------------------------------------------
>
>                 Key: YARN-367
>                 URL: https://issues.apache.org/jira/browse/YARN-367
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Zhijie Shen
>
> If yarn.nodemanager.local-dirs is not explicitly set, and if the default 
> local-dirs are not the children of hadoop.tmp.dir, the exception will occur 
> when the wordcount example is run. Bellow is log info.
> ==========
> 2013-01-30 22:16:04,229 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Start request for container_1359612879014_0001_01_000001 by user zshen
> 2013-01-30 22:16:04,247 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>  Creating a new application reference for app application_1359612879014_0001
> 2013-01-30 22:16:04,250 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=zshen      
> IP=127.0.0.1    OPERATION=Start Container Request       
> TARGET=ContainerManageImpl      RESULT=SUCCESS  
> APPID=application_1359612879014_0001    
> CONTAINERID=container_1359612879014_0001_01_000001
> 2013-01-30 22:16:04,252 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1359612879014_0001 transitioned from NEW to INITING
> 2013-01-30 22:16:04,252 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Adding container_1359612879014_0001_01_000001 to application 
> application_1359612879014_0001
> 2013-01-30 22:16:04,257 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1359612879014_0001 transitioned from INITING to 
> RUNNING
> 2013-01-30 22:16:04,262 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1359612879014_0001_01_000001 transitioned from NEW to 
> LOCALIZING
> 2013-01-30 22:16:04,268 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens
>  transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar
>  transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo
>  transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,268 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split
>  transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,269 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml
>  transitioned from INIT to DOWNLOADING
> 2013-01-30 22:16:04,269 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Created localizer for container_1359612879014_0001_01_000001
> 2013-01-30 22:16:04,401 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  Writing credentials to the nmPrivate file 
> /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens.
>  Credentials list: 
> 2013-01-30 22:16:04,423 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
> Initializing user zshen
> 2013-01-30 22:16:04,569 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying 
> from 
> /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens
>  to 
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001.tokens
> 2013-01-30 22:16:04,570 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set 
> to 
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
>  = 
> file:/tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> 2013-01-30 22:16:04,955 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
> status for container: container_id {, app_attempt_id {, application_id {, id: 
> 1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state: 
> C_RUNNING, diagnostics: "", exit_status: -1000, 
> 2013-01-30 22:16:05,117 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens
>  transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,312 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar
>  transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,465 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo
>  transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,608 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split
>  transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,751 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml
>  transitioned from DOWNLOADING to LOCALIZED
> 2013-01-30 22:16:05,752 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1359612879014_0001_01_000001 transitioned from 
> LOCALIZING to LOCALIZED
> 2013-01-30 22:16:05,866 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1359612879014_0001_01_000001 transitioned from LOCALIZED 
> to RUNNING
> 2013-01-30 22:16:05,866 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ResourceCalculatorPlugin is unavailable on this system. 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is disabled.
> 2013-01-30 22:16:05,910 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Failed to launch container.
> java.io.FileNotFoundException: File 
> /Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001
>  does not exist
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:498)
>       at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:996)
>       at 
> org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)
>       at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)
>       at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730)
>       at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726)
>       at 
> org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379)
>       at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:330)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:135)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:680)
> 2013-01-30 22:16:05,913 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1359612879014_0001_01_000001 transitioned from RUNNING 
> to EXITED_WITH_FAILURE
> 2013-01-30 22:16:05,914 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,934 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
> absolute path : 
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,934 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=zshen      
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl    
> RESULT=FAILURE  DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE  
>   APPID=application_1359612879014_0001    
> CONTAINERID=container_1359612879014_0001_01_000001
> 2013-01-30 22:16:05,937 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1359612879014_0001_01_000001 transitioned from 
> EXITED_WITH_FAILURE to DONE
> 2013-01-30 22:16:05,937 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Removing container_1359612879014_0001_01_000001 from application 
> application_1359612879014_0001
> 2013-01-30 22:16:05,937 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  ResourceCalculatorPlugin is unavailable on this system. 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
>  is disabled.
> 2013-01-30 22:16:05,958 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 
> status for container: container_id {, app_attempt_id {, application_id {, id: 
> 1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state: 
> C_COMPLETE, diagnostics: "", exit_status: -1, 
> 2013-01-30 22:16:05,959 INFO 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed 
> completed container container_1359612879014_0001_01_000001
> 2013-01-30 22:16:06,965 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1359612879014_0001 transitioned from RUNNING to 
> APPLICATION_RESOURCES_CLEANINGUP
> 2013-01-30 22:16:06,965 INFO 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting 
> absolute path : 
> /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001
> 2013-01-30 22:16:06,966 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
> event APPLICATION_STOP for appId application_1359612879014_0001
> 2013-01-30 22:16:06,970 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
>  Application application_1359612879014_0001 transitioned from 
> APPLICATION_RESOURCES_CLEANINGUP to FINISHED
> 2013-01-30 22:16:06,970 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler:
>  Scheduling Log Deletion for application: application_1359612879014_0001, 
> with delay of 10800 seconds
> ==========
> Below is the setting in hdfs-site.xml.
> ==========
> <property>
>     <name>hadoop.tmp.dir</name>
>     <value>/Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data</value>
> </property>
> ==========

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to