[ https://issues.apache.org/jira/browse/YARN-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567843#comment-13567843 ]
Chris Nauroth commented on YARN-367: ------------------------------------ Hi, Zhijie. I was able to reproduce this bug. When overriding hadoop.tmp.dir, typical usage is to specify it in core-site.xml instead of hdfs-site.xml, so that all of the Hadoop processes receive the new value. After I moved my configuration of hadoop.tmp.dir to core-site.xml, I stopped seeing this bug. When I repro'd, I noticed that localization is attempting to use the default local dir for usercache, but then container launch is trying to use the local dir I configured in hdfs-site.xml. Therefore, container launch doesn't find the container working directory in the place it expects: 2013-01-31 09:02:28,623 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(101)) - CWD set to /tmp/hadoop-chris/nm-local-dir/usercache/chris/appcache/application_1359651644148_0002 = file:/tmp/hadoop-chris/nm-local-dir/usercache/chris/appcache/application_1359651644148_0002 2013-01-31 09:02:29,922 WARN launcher.ContainerLaunch (ContainerLaunch.java:call(247)) - Failed to launch container. java.io.FileNotFoundException: File /Users/chris/hadoop-deploy-trunk/hadoop-3.0.0-SNAPSHOT/data/nm-local-dir/usercache/chris/appcache/application_1359651644148_0002/container_1359651644148_0002_01_000001 does not exist My theory is that configuration is getting loaded in 2 different ways for localization and container launch. The configuration for localization is not loading hdfs-site.xml, but the configuration for container launch is loading hdfs-site.xml, so the 2 pieces are seeing different configurations. I'm not sure if YARN daemons should be loading hdfs-site.xml. Whatever the choice, it probably should be consistent throughout the code. This would be good to track down eventually, but for now, I expect you can quickly fix your environment by moving your hadoop.tmp.dir configuration into core-site.xml. I hope this helps! > Exception when yarn.nodemanager.local-dirs is not explicitly set > ---------------------------------------------------------------- > > Key: YARN-367 > URL: https://issues.apache.org/jira/browse/YARN-367 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Reporter: Zhijie Shen > > If yarn.nodemanager.local-dirs is not explicitly set, and if the default > local-dirs are not the children of hadoop.tmp.dir, the exception will occur > when the wordcount example is run. Bellow is log info. > ========== > 2013-01-30 22:16:04,229 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Start request for container_1359612879014_0001_01_000001 by user zshen > 2013-01-30 22:16:04,247 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: > Creating a new application reference for app application_1359612879014_0001 > 2013-01-30 22:16:04,250 INFO > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=zshen > IP=127.0.0.1 OPERATION=Start Container Request > TARGET=ContainerManageImpl RESULT=SUCCESS > APPID=application_1359612879014_0001 > CONTAINERID=container_1359612879014_0001_01_000001 > 2013-01-30 22:16:04,252 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1359612879014_0001 transitioned from NEW to INITING > 2013-01-30 22:16:04,252 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Adding container_1359612879014_0001_01_000001 to application > application_1359612879014_0001 > 2013-01-30 22:16:04,257 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1359612879014_0001 transitioned from INITING to > RUNNING > 2013-01-30 22:16:04,262 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1359612879014_0001_01_000001 transitioned from NEW to > LOCALIZING > 2013-01-30 22:16:04,268 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens > transitioned from INIT to DOWNLOADING > 2013-01-30 22:16:04,268 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar > transitioned from INIT to DOWNLOADING > 2013-01-30 22:16:04,268 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo > transitioned from INIT to DOWNLOADING > 2013-01-30 22:16:04,268 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split > transitioned from INIT to DOWNLOADING > 2013-01-30 22:16:04,269 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml > transitioned from INIT to DOWNLOADING > 2013-01-30 22:16:04,269 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Created localizer for container_1359612879014_0001_01_000001 > 2013-01-30 22:16:04,401 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Writing credentials to the nmPrivate file > /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens. > Credentials list: > 2013-01-30 22:16:04,423 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Initializing user zshen > 2013-01-30 22:16:04,569 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying > from > /tmp/hadoop-zshen/nm-local-dir/nmPrivate/container_1359612879014_0001_01_000001.tokens > to > /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001.tokens > 2013-01-30 22:16:04,570 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set > to > /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001 > = > file:/tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001 > 2013-01-30 22:16:04,955 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out > status for container: container_id {, app_attempt_id {, application_id {, id: > 1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state: > C_RUNNING, diagnostics: "", exit_status: -1000, > 2013-01-30 22:16:05,117 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/appTokens > transitioned from DOWNLOADING to LOCALIZED > 2013-01-30 22:16:05,312 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.jar > transitioned from DOWNLOADING to LOCALIZED > 2013-01-30 22:16:05,465 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.splitmetainfo > transitioned from DOWNLOADING to LOCALIZED > 2013-01-30 22:16:05,608 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.split > transitioned from DOWNLOADING to LOCALIZED > 2013-01-30 22:16:05,751 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > hdfs://localhost:9001/tmp/hadoop-yarn/staging/zshen/.staging/job_1359612879014_0001/job.xml > transitioned from DOWNLOADING to LOCALIZED > 2013-01-30 22:16:05,752 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1359612879014_0001_01_000001 transitioned from > LOCALIZING to LOCALIZED > 2013-01-30 22:16:05,866 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1359612879014_0001_01_000001 transitioned from LOCALIZED > to RUNNING > 2013-01-30 22:16:05,866 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ResourceCalculatorPlugin is unavailable on this system. > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is disabled. > 2013-01-30 22:16:05,910 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Failed to launch container. > java.io.FileNotFoundException: File > /Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001 > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:498) > at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:996) > at > org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) > at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:730) > at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:726) > at > org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2379) > at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:726) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:330) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:135) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:242) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:68) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > 2013-01-30 22:16:05,913 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1359612879014_0001_01_000001 transitioned from RUNNING > to EXITED_WITH_FAILURE > 2013-01-30 22:16:05,914 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1359612879014_0001_01_000001 > 2013-01-30 22:16:05,934 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > absolute path : > /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001/container_1359612879014_0001_01_000001 > 2013-01-30 22:16:05,934 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=zshen > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE > APPID=application_1359612879014_0001 > CONTAINERID=container_1359612879014_0001_01_000001 > 2013-01-30 22:16:05,937 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: > Container container_1359612879014_0001_01_000001 transitioned from > EXITED_WITH_FAILURE to DONE > 2013-01-30 22:16:05,937 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Removing container_1359612879014_0001_01_000001 from application > application_1359612879014_0001 > 2013-01-30 22:16:05,937 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > ResourceCalculatorPlugin is unavailable on this system. > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > is disabled. > 2013-01-30 22:16:05,958 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out > status for container: container_id {, app_attempt_id {, application_id {, id: > 1, cluster_timestamp: 1359612879014, }, attemptId: 1, }, id: 1, }, state: > C_COMPLETE, diagnostics: "", exit_status: -1, > 2013-01-30 22:16:05,959 INFO > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed > completed container container_1359612879014_0001_01_000001 > 2013-01-30 22:16:06,965 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1359612879014_0001 transitioned from RUNNING to > APPLICATION_RESOURCES_CLEANINGUP > 2013-01-30 22:16:06,965 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting > absolute path : > /tmp/hadoop-zshen/nm-local-dir/usercache/zshen/appcache/application_1359612879014_0001 > 2013-01-30 22:16:06,966 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got > event APPLICATION_STOP for appId application_1359612879014_0001 > 2013-01-30 22:16:06,970 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: > Application application_1359612879014_0001 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2013-01-30 22:16:06,970 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: > Scheduling Log Deletion for application: application_1359612879014_0001, > with delay of 10800 seconds > ========== > Below is the setting in hdfs-site.xml. > ========== > <property> > <name>hadoop.tmp.dir</name> > <value>/Users/zshen/Deployment/hadoop-3.0.0-SNAPSHOT/data</value> > </property> > ========== -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira