Tao Yang created YARN-5749:
------------------------------

             Summary: Fail to localize resources after health status for local 
dirs changed occurred by the change of FileContext#setUMask
                 Key: YARN-5749
                 URL: https://issues.apache.org/jira/browse/YARN-5749
             Project: Hadoop YARN
          Issue Type: Bug
          Components: nodemanager
    Affects Versions: 3.0.0-alpha2
            Reporter: Tao Yang


HADOOP-13440 updated FileContext#setUMask method to change umask from local 
variable to global variable through updating conf value of 
"fs.permissions.umask-mode". 

This method might be called to update value for global umask by LogWriter and 
ResourceLocalizationService. 
After an application finished, LogWriter will update the umask value to be 
"137" while uploading logs for containers. Then the global umask value is 
updated right now and will affect other services. In my case , After one of 
local directories is marked as bad (because the disk used space is above the 
threshold defined by 
"yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage"),
 ResourceLocalizationService will reinitailize the left local directories and 
change the permission from "drwxr-xr-x" to "drw-r-----"(umask value changed 
from "022" to "137"). From now on, The NM will always fail to localize 
resources as the local directories is not executable.

Detail logs are as follows:
{code}
2016-10-19 15:36:32,650 WARN 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext: Disk Error 
Exception:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Directory is not 
executable: /home/yangtao.yt/hadoop-data/nm-local-dir-2/nmPrivate
        at 
org.apache.hadoop.util.DiskChecker.checkAccessByFileMethods(DiskChecker.java:215)
        at 
org.apache.hadoop.util.DiskChecker.checkDirAccess(DiskChecker.java:190)
        at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:124)
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:350)
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:412)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
        at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
2016-10-19 15:36:32,650 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
 Localizer failed
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid 
local directory for nmPrivate/container_e26_1476858409240_0004_01_000005.tokens
        at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
        at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:116)
        at 
org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocalPathForWrite(LocalDirsHandlerService.java:563)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1162)
2016-10-19 15:36:32,652 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_e26_1476858409240_0004_01_000005 transitioned from 
LOCALIZING to LOCALIZATION_FAILED
{code}

In my opinion, it's better if FileContext can compatible with past usage.
Please feel free to give your suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to