[ https://issues.apache.org/jira/browse/YARN-7879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350628#comment-16350628 ]
Jason Lowe commented on YARN-7879: ---------------------------------- I also manually tested the patch on a secure cluster and verified non-private resources are not re-localized with each application. > NM user is unable to access the application filecache due to permissions > ------------------------------------------------------------------------ > > Key: YARN-7879 > URL: https://issues.apache.org/jira/browse/YARN-7879 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 3.1.0 > Reporter: Shane Kumpf > Assignee: Jason Lowe > Priority: Critical > Attachments: YARN-7879.001.patch > > > I noticed the following log entries where localization was being retried on > several MR AM files. > {code} > 2018-02-02 02:53:02,905 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar > is missing, localizing it again > 2018-02-02 02:53:42,908 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl: > Resource > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml > is missing, localizing it again > {code} > The cluster is configured to use LCE and > {{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is > set to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has > a umask of {{0002}}. The cluser is configured with > {{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the > local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group, > produces the same results. > {code} > [hadoopuser@y7001 ~]$ umask > 0002 > [hadoopuser@y7001 ~]$ id > uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop) > {code} > The cause of the log entry was tracked down a simple !file.exists call in > {{LocalResourcesTrackerImpl#isResourcePresent}}. > {code} > public boolean isResourcePresent(LocalizedResource rsrc) { > boolean ret = true; > if (rsrc.getState() == ResourceState.LOCALIZED) { > File file = new File(rsrc.getLocalPath().toUri().getRawPath(). > toString()); > if (!file.exists()) { > ret = false; > } else if (dirsHandler != null) { > ret = checkLocalResource(rsrc); > } > } > return ret; > } > {code} > The Resources Tracker runs as the NM user, in this case {{yarn}}. The files > being retried are in the filecache. The directories in the filecache are all > owned by the local-user's primary group and 700 perms, which makes it > unreadable by the {{yarn}} user. > {code} > [root@y7001 ~]# ls -la > /hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache > total 0 > drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 . > drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 .. > drwx------. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10 > drwx------. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11 > drwx------. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12 > drwx------. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13 > {code} > I saw YARN-5287, but that appears to be related to a restrictive umask and > the usercache itself. I was unable to locate any other known issues that > seemed relevent. Is the above already known? a configuration issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org