[ https://issues.apache.org/jira/browse/YARN-5749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tao Yang updated YARN-5749: --------------------------- Summary: Fail to localize resources after health status for local dirs changed (was: Fail to localize resources after health status for local dirs changed occurred by the change of FileContext#setUMask) > Fail to localize resources after health status for local dirs changed > --------------------------------------------------------------------- > > Key: YARN-5749 > URL: https://issues.apache.org/jira/browse/YARN-5749 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 3.0.0-alpha2 > Reporter: Tao Yang > > HADOOP-13440 updated FileContext#setUMask method to change umask from local > variable to global variable through updating conf value of > "fs.permissions.umask-mode". > This method might be called to update value for global umask by LogWriter and > ResourceLocalizationService. > After an application finished, LogWriter will update the umask value to be > "137" while uploading logs for containers. Then the global umask value is > updated right now and will affect other services. In my case , After one of > local directories is marked as bad (because the disk used space is above the > threshold defined by > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"), > ResourceLocalizationService will reinitailize the left local directories and > change the permission from "drwxr-xr-x" to "drw-r-----"(umask value changed > from "022" to "137"). From now on, The NM will always fail to localize > resources as the local directories is not executable. > Detail logs are as follows: > {code} > 2016-10-19 15:36:32,650 WARN > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext: Disk Error > Exception: > org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not > executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate > at > org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215) > at > org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190) > at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124) > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350) > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162) > 2016-10-19 15:36:32,650 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > Localizer failed > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any > valid local directory for > nmPrivate/container_e26_1476858409240_0004_01_000005.tokens > at > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132) > at > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116) > at > org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162) > 2016-10-19 15:36:32,652 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_e26_1476858409240_0004_01_000005 transitioned from > LOCALIZING to LOCALIZATION_FAILED > {code} > In my opinion, it's better if FileContext can compatible with past usage. > Please feel free to give your suggestions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org