Shane Kumpf created YARN-7879: --------------------------------- Summary: NM user is unable to access the application filecache due to permissions Key: YARN-7879 URL: https://issues.apache.org/jira/browse/YARN-7879 Project: Hadoop YARN Issue Type: Bug Reporter: Shane Kumpf
I noticed the following log entries where localization was being retried on several MR AM files. {code} 2018-02-02 02:53:02,905 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar is missing, localizing it again 2018-02-02 02:53:42,908 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: Resource /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml is missing, localizing it again {code} The cluster is configured to use LCE and {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has a umask of {{0002}}. The cluser is configured with {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, produces the same results. {code} [hadoopuser@y7001 ~]$ umask 0002 [hadoopuser@y7001 ~]$ id uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) {code} The cause of the log entry was tracked down a simple !file.exists call in {{LocalResourcesTrackerImpl#isResourcePresent}}. {code} public boolean isResourcePresent(LocalizedResource rsrc) { boolean ret = true; if (rsrc.getState() == ResourceState.LOCALIZED) { File file = new File(rsrc.getLocalPath().toUri().getRawPath(). toString()); if (!file.exists()) { ret = false; } else if (dirsHandler != null) { ret = checkLocalResource(rsrc); } } return ret; } {code} The Resources Tracker runs as the NM user, in this case {{yarn}}. The files being retried are in the filecache. The directories in the filecache are all owned by the local-user's primary group and 700 perms, which makes it unreadable by the {{yarn}} user. {code} [root@y7001 ~]# ls -la /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache total 0 drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. drwx------. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 drwx------. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 drwx------. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 drwx------. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 {code} I saw YARN-5287, but that appears to be related to a restrictive umask and the usercache itself. I was unable to locate any other known issues that seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org