[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625438#comment-14625438 ]
Allen Wittenauer commented on YARN-3921: ---------------------------------------- AFAIK, YARN won't change the permissions on the work dirs when you switch modes. The assumption is the ops folks/tools will handle this as part of the transition. amabari-qa's dir changing seems to be more related to something else (are these machines being managed via ambari and, like a naughty child, ambari is putting these were they don't belong?) given that you didn't say that a job belonging to the user ambari-qa job was run... > Permission denied errors for local usercache directories when attempting to > run MapReduce job on Kerberos enabled cluster > -------------------------------------------------------------------------------------------------------------------------- > > Key: YARN-3921 > URL: https://issues.apache.org/jira/browse/YARN-3921 > Project: Hadoop YARN > Issue Type: Bug > Affects Versions: 2.7.1 > Environment: sles11sp3 > Reporter: Zack Marsh > > Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple > MapReduce example as the Linux user 'tdatuser': > {code} > iripiri1:~ # su tdatuser > tdatuser@piripiri1:/root> yarn jar > /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi > 16 10000 > Number of Maps = 16 > Samples per Map = 10000 > Wrote input for Map #0 > Wrote input for Map #1 > Wrote input for Map #2 > Wrote input for Map #3 > Wrote input for Map #4 > Wrote input for Map #5 > Wrote input for Map #6 > Wrote input for Map #7 > Wrote input for Map #8 > Wrote input for Map #9 > Wrote input for Map #10 > Wrote input for Map #11 > Wrote input for Map #12 > Wrote input for Map #13 > Wrote input for Map #14 > Wrote input for Map #15 > Starting Job > 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: > http:/ s/v1/timeline/ > 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing > over to > 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to > process : 16 > 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16 > 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_14 > 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application > application_14 > 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: > http://piripiri3 cation_1436821014431_0003/ > 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003 > 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running > in uber mode : false > 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0% > 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0% > 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0% > 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0% > 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0% > 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0% > 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0% > 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0% > 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0% > 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25% > 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25% > 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31% > 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100% > 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 > completed successfully > 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49 > File System Counters > FILE: Number of bytes read=358 > FILE: Number of bytes written=2249017 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=4198 > HDFS: Number of bytes written=215 > HDFS: Number of read operations=67 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=3 > Job Counters > Launched map tasks=16 > Launched reduce tasks=1 > Data-local map tasks=16 > Total time spent by all maps in occupied slots (ms)=160498 > Total time spent by all reduces in occupied slots > (ms)=27302 > Total time spent by all map tasks (ms)=80249 > Total time spent by all reduce tasks (ms)=13651 > Total vcore-seconds taken by all map tasks=80249 > Total vcore-seconds taken by all reduce tasks=13651 > Total megabyte-seconds taken by all map tasks=246524928 > Total megabyte-seconds taken by all reduce tasks=41935872 > Map-Reduce Framework > Map input records=16 > Map output records=32 > Map output bytes=288 > Map output materialized bytes=448 > Input split bytes=2310 > Combine input records=0 > Combine output records=0 > Reduce input groups=2 > Reduce shuffle bytes=448 > Reduce input records=32 > Reduce output records=0 > Spilled Records=64 > Shuffled Maps =16 > Failed Shuffles=0 > Merged Map outputs=16 > GC time elapsed (ms)=1501 > CPU time spent (ms)=13670 > Physical memory (bytes) snapshot=13480296448 > Virtual memory (bytes) snapshot=72598511616 > Total committed heap usage (bytes)=12508463104 > Shuffle Errors > BAD_ID=0 > CONNECTION=0 > IO_ERROR=0 > WRONG_LENGTH=0 > WRONG_MAP=0 > WRONG_REDUCE=0 > File Input Format Counters > Bytes Read=1888 > File Output Format Counters > Bytes Written=97 > Job Finished in 226.813 seconds > Estimated value of Pi is 3.14127500000000000000 > {code} > However, after enabling Kerberos, the job fails: > {code} > tdatuser@piripiri1:/root> kinit -kt > /etc/security/keytabs/tdatuser.headless.keytab tdatuser > tdatuser@piripiri1:/root> yarn jar > /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi > 16 10000 > Number of Maps = 16 > Samples per Map = 10000 > Wrote input for Map #0 > Wrote input for Map #1 > Wrote input for Map #2 > Wrote input for Map #3 > Wrote input for Map #4 > Wrote input for Map #5 > Wrote input for Map #6 > Wrote input for Map #7 > Wrote input for Map #8 > Wrote input for Map #9 > Wrote input for Map #10 > Wrote input for Map #11 > Wrote input for Map #12 > Wrote input for Map #13 > Wrote input for Map #14 > Wrote input for Map #15 > Starting Job > 15/07/13 17:27:05 INFO impl.TimelineClientImpl: Timeline service address: > http://piripiri1.labs.teradata.com:8188/ws/v1/timeline/ > 15/07/13 17:27:05 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN > token 140 for tdatuser on ha-hdfs:PIRIPIRI > 15/07/13 17:27:05 INFO security.TokenCache: Got dt for hdfs://PIRIPIRI; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:PIRIPIRI, Ident: > (HDFS_DELEGATION_TOKEN token 140 for tdatuser) > 15/07/13 17:27:06 INFO input.FileInputFormat: Total input paths to > process : 16 > 15/07/13 17:27:06 INFO mapreduce.JobSubmitter: number of splits:16 > 15/07/13 17:27:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1436822321287_0007 > 15/07/13 17:27:06 INFO mapreduce.JobSubmitter: Kind: > HDFS_DELEGATION_TOKEN, Service: ha-hdfs:PIRIPIRI, Ident: > (HDFS_DELEGATION_TOKEN token 140 for tdatuser) > 15/07/13 17:27:06 INFO impl.YarnClientImpl: Submitted application > application_1436822321287_0007 > 15/07/13 17:27:06 INFO mapreduce.Job: The url to track the job: > http://piripiri2.labs.teradata.com:8088/proxy/application_1436822321287_0007/ > 15/07/13 17:27:06 INFO mapreduce.Job: Running job: job_1436822321287_0007 > 15/07/13 17:27:09 INFO mapreduce.Job: Job job_1436822321287_0007 running > in uber mode : false > 15/07/13 17:27:09 INFO mapreduce.Job: map 0% reduce 0% > 15/07/13 17:27:09 INFO mapreduce.Job: Job job_1436822321287_0007 failed > with state FAILED due to: Application application_1436822321287_0007 failed 2 > times due to AM Container for appattempt_1436822321287_0007_000002 exited > with exitCode: -1000 > For more detailed output, check application tracking > page:http://piripiri2.labs.teradata.com:8088/cluster/app/application_1436822321287_0007Then, > click on links to logs of each attempt. > Diagnostics: Application application_1436822321287_0007 initialization > failed (exitCode=255) with output: main : command provided 0 > main : run as user is tdatuser > main : requested yarn user is tdatuser > Can't create directory > /data1/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data2/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data3/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data4/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data5/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data6/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data7/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data8/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data9/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data10/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data11/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Can't create directory > /data12/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 > - Permission denied > Did not create any app directories > Failing this attempt. Failing the application. > 15/07/13 17:27:09 INFO mapreduce.Job: Counters: 0 > Job Finished in 4.748 seconds > java.io.FileNotFoundException: File does not exist: > hdfs://PIRIPIRI/user/tdatuser/QuasiMonteCarlo_1436822823095_2120947622/out/reduce-out > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1776) > at > org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314) > at > org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at > org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at > org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {code} > As seen above there are many "Can't create directory... Permission denied > errors" related to the local usercache directory for the 'tdatuser'. > Prior to enabling Kerberos, the contents of a usercache directory was as > follows: > {code} > piripiri4:~ # ls -l /data1/hadoop/yarn/local/usercache/ > total 0 > drwxr-xr-x 3 yarn hadoop 21 Jul 13 16:59 ambari-qa > drwxr-x--- 4 yarn hadoop 37 Jul 13 17:00 tdatuser > {code} > After enabling Kerberos the contents are: > {code} > piripiri4:~ # ls -l /data1/hadoop/yarn/local/usercache/ > total 0 > drwxr-s--- 4 ambari-qa hadoop 37 Jul 13 17:21 ambari-qa > drwxr-x--- 4 yarn hadoop 37 Jul 13 17:00 tdatuser > {code} > It appears that the owner of the usercache directory for the 'ambari-qa' user > was updated, but the 'tdatuser' directory was not. > Is this expected behavior, and is there a recommended work-around for this > issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332)