[ https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939530#comment-14939530 ]
Varun Vasudev commented on YARN-3514: ------------------------------------- [~leftnoteasy], [~cnauroth] - can the latest patch be committed? > Active directory usernames like domain\login cause YARN failures > ---------------------------------------------------------------- > > Key: YARN-3514 > URL: https://issues.apache.org/jira/browse/YARN-3514 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.2.0 > Environment: CentOS6 > Reporter: john lilley > Assignee: Chris Nauroth > Priority: Minor > Labels: BB2015-05-TBR > Attachments: YARN-3514.001.patch, YARN-3514.002.patch > > > We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is > Kerberos-enabled and uses an external AD domain controller for the KDC. We > are able to authenticate, browse HDFS, etc. However, YARN fails during > localization because it seems to get confused by the presence of a \ > character in the local user name. > Our AD authentication on the nodes goes through sssd and set configured to > map AD users onto the form domain\username. For example, our test user has a > Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user > "domain\hadoopuser". We have no problem validating that user with PAM, > logging in as that user, su-ing to that user, etc. > However, when we attempt to run a YARN application master, the localization > step fails when setting up the local cache directory for the AM. The error > that comes out of the RM logs: > 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: > ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, > diagnostics='Application application_1429295486450_0001 failed 1 times due to > AM Container for appattempt_1429295486450_0001_000001 exited with exitCode: > -1000 due to: Application application_1429295486450_0001 initialization > failed (exitCode=255) with output: main : command provided 0 > main : user is DOMAIN\hadoopuser > main : requested yarn user is domain\hadoopuser > org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create > directory: > /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10 > at > org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347) > .Failing this attempt.. Failing the application.' > However, when we look on the node launching the AM, we see this: > [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache > [root@rpb-cdh-kerb-2 usercache]# ls -l > drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser > There appears to be different treatment of the \ character in different > places. Something creates the directory as "domain\hadoopuser" but something > else later attempts to use it as "domain%5Chadoopuser". I’m not sure where > or why the URL escapement converts the \ to %5C or why this is not consistent. > I should also mention, for the sake of completeness, our auth_to_local rule > is set up to map u...@domain.com to domain\user: > RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g -- This message was sent by Atlassian JIRA (v6.3.4#6332)